On Sunday, 18 February 2018 at 17:54:58 UTC, SrMordred wrote:
I´m experimenting with threads and related recently.
(i´m just started so may be some terrrible mistakes here)
With this base work:
foreach(i ; 0 .. SIZE)
{
results[i] = values1[i] * values2[i];
}
and then with this 3 others methods: parallel, spawn and
Threads.
this was my results:
_base : 456 ms and 479 us
_parallel : 331 ms, 324 us, and 4 hnsecs
_concurrency : 367 ms, 348 us, and 2 hnsecs
_thread : 369 ms, 565 us, and 3 hnsecs
(code here : https://run.dlang.io/is/2pdmmk )
All methods have minor speedup gains. I was expecting a lot
more.
Since I have 7 cores I expected like below 100ms.
I´m not seeing false sharing in this case. or i'm wrong?
If someone can expand on this, i'll be grateful.
Thanks!
As SIZE=1024*1024 (i.e. not much, possibly well within L2 cache
for 32bit) it may be that dealing with the concurrency overhead
adds a significant amount of overhead. Also the run.dlang.io link
has no -O flag and thus no optimisations
without -O i get
_base : 323 ms, 92 μs, and 6 hnsecs
_parallel : 276 ms, 649 μs, and 3 hnsecs
_concurrency : 221 ms, 931 μs, and 7 hnsecs
_thread : 212 ms, 277 μs, and 3 hnsecs
with it I get
_base : 150 ms, 728 μs, and 5 hnsecs
_parallel : 120 ms, 78 μs, and 5 hnsecs
_concurrency : 134 ms, 787 μs, and 4 hnsecs
_thread : 129 ms, 476 μs, and 2 hnsecs
with SIZE= 16*1024*1024 without -O i get
_base : 5 secs, 835 ms, 240 μs, and 9 hnsecs
_parallel : 4 secs, 802 ms, 279 μs, and 8 hnsecs
_concurrency : 2 secs, 133 ms, 685 μs, and 3 hnsecs
_thread : 2 secs, 108 ms, 860 μs, and 9 hnsecs
with SIZE= 16*1024*1024 with -O i get
_base : 2 secs, 502 ms, 523 μs, and 4 hnsecs
_parallel : 1 sec, 769 ms, 945 μs, and 3 hnsecs
_concurrency : 1 sec, 362 ms, 747 μs, and 1 hnsec
_thread : 1 sec, 335 ms, 720 μs, and 1 hn