On Sunday, 18 February 2018 at 17:54:58 UTC, SrMordred wrote:
I´m experimenting with threads and related recently.
(i´m just started so may be some terrrible mistakes here)

With this base work:

foreach(i ; 0 .. SIZE)
{
    results[i] = values1[i] * values2[i];
}

and then with this 3 others methods: parallel, spawn and Threads.

this was my results:

_base : 456 ms and 479 us
_parallel : 331 ms, 324 us, and 4 hnsecs
_concurrency : 367 ms, 348 us, and 2 hnsecs
_thread : 369 ms, 565 us, and 3 hnsecs

(code here : https://run.dlang.io/is/2pdmmk )

All methods have minor speedup gains. I was expecting a lot more.
Since I have 7 cores I expected like below 100ms.

I´m not seeing false sharing in this case. or i'm wrong?

If someone can expand on this, i'll be grateful.

Thanks!

As SIZE=1024*1024 (i.e. not much, possibly well within L2 cache for 32bit) it may be that dealing with the concurrency overhead adds a significant amount of overhead. Also the run.dlang.io link has no -O flag and thus no optimisations

without -O i get

_base : 323 ms, 92 μs, and 6 hnsecs
_parallel : 276 ms, 649 μs, and 3 hnsecs
_concurrency : 221 ms, 931 μs, and 7 hnsecs
_thread : 212 ms, 277 μs, and 3 hnsecs

with it I get
_base : 150 ms, 728 μs, and 5 hnsecs
_parallel : 120 ms, 78 μs, and 5 hnsecs
_concurrency : 134 ms, 787 μs, and 4 hnsecs
_thread : 129 ms, 476 μs, and 2 hnsecs

with SIZE= 16*1024*1024 without -O i get
_base : 5 secs, 835 ms, 240 μs, and 9 hnsecs
_parallel : 4 secs, 802 ms, 279 μs, and 8 hnsecs
_concurrency : 2 secs, 133 ms, 685 μs, and 3 hnsecs
_thread : 2 secs, 108 ms, 860 μs, and 9 hnsecs

with SIZE= 16*1024*1024 with -O i get
_base : 2 secs, 502 ms, 523 μs, and 4 hnsecs
_parallel : 1 sec, 769 ms, 945 μs, and 3 hnsecs
_concurrency : 1 sec, 362 ms, 747 μs, and 1 hnsec
_thread : 1 sec, 335 ms, 720 μs, and 1 hn


Reply via email to