On Sat, 2014-02-22 at 16:21 +0000, "Nordlöw" wrote: > In the following test code given below of std.parallelism I get > some interesting results: > > when compiled as > > dmd -release -noboundscheck -O -inline -w -wi -wi > ~/Work/justd/t_parallelism.d -oft_parallelism > > My scalability measures says the following > > 3.14159 took 221[ms] > 3.14159 took 727[ms] > Speedup 3.28959 > -5.80829e+09 took 33[ms] > -5.80829e+09 took 201[ms] > Speedup 6.09091 > > Why do I get a larger speed for a simpler map function? > Shouldn't it be the opposite?
I'm not sure just now, but it is an interesting issue. On my 7 year old twin 2.33GHz Xeon, I get: 3.14159 took 128[ms] 3.14159 took 860[ms] Speedup 6.71875 -5.80829e+09 took 28[ms] -5.80829e+09 took 302[ms] Speedup 10.7857 > I've always read that the more calculations I perform on each > memory access the better the speedup... Not necessarily, it depends on the caches and lots of other fun things. > Anyhow the speedups are great! > > I'm sitting on a Intel Quad core with 8 hyperthreads. As anyone involved in native code CPU-bound benchmarking will tell you, hyperthreads are generally a waste of time. So on your machine I'd expect a "flat out" speed up of 4 on your machine, 8 on mine. Virtual machines, e.g. JVM and PVM, can sometimes do interesting things that make hyperthreads useful. > Sample code follows: Hey, I recognize the core of this code, it's my π by quadrature example. Sadly π is not well approximately by -5.80829e+09 :-) I shal tinker with this a bit as I want to understand why the speed up is greater than the number of cores. It implies overhead in one of the algorithms. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: rus...@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder