[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292 --- Comment #9 from FH fh_p at hotmail dot com 2012-05-12 21:27:31 UTC --- Well... I tested an OpenMP benchmarch (design to demonstrate OpenMP performances) found on the web : multi-threaded (OpenMP) is again slower than single-threaded. I looked at coding with pthreads : same thing. So, I have a dual-core hyper-threaded PC : I end up with multi-threaded applications slower than single-threaded and this is supposed to be a normal behavior ?!... Anyway this is still illogical to me ?!?!
[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292 --- Comment #1 from FH fh_p at hotmail dot com 2012-05-09 09:17:29 UTC --- I am not sure to know if this problem is related rather to gcc or rather to Ubuntu. I started with the assumption that is should rather to related to gcc.
[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292 --- Comment #2 from FH fh_p at hotmail dot com 2012-05-09 10:16:52 UTC --- I have just tested on another computer (CPU : Xeon5650 12 cores + OS : Scientific Linux) = I reproduce the unexpected behavior (OpenMP slower than single-threaded). So, I believe the problem is rather related to gcc (than to the OS) When I use more threads (export OMP_NUM_THREADS=2, then 6, then 12), OpenMP is more slower than single-threaded. (behavior related to thread initialisation ?)
[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||jakub at gcc dot gnu.org Resolution||INVALID --- Comment #3 from Jakub Jelinek jakub at gcc dot gnu.org 2012-05-09 12:15:53 UTC --- This is just a bad test. You are storing the values in the different threads, but then reading everything in a single thread only.
[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292 --- Comment #4 from FH fh_p at hotmail dot com 2012-05-09 12:53:46 UTC --- I don't understand your answer. Timing just times the for loop. Checking array content is single threaded : this is added to make sure the for loop has done the job correctly and this check is not timed. The array to initialize is shared by threads (shared by default) ans not private to each thread. To me, the test seems relevant. If it's not, why ? And how to modify it ?
[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292 FH fh_p at hotmail dot com changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID | --- Comment #5 from FH fh_p at hotmail dot com 2012-05-09 12:55:56 UTC --- To me, the test seems relevant. If it's not, why ? And how to modify it ?
[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||INVALID --- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org 2012-05-09 13:29:00 UTC --- Sorry, missed you aren't measuring it with the single-threaded loop. Anyway, the test is still not relevant, it is purely memory bound, and as you can see from running it with very small arguments, the thread creation and omp for initial overhead is in the noise, what you see is just how the cache hierarchy of your CPU works. The inner loop in which all the measured time is spent in is very similar (and even if hand edited to be identical it doesn't help at all).
[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292 --- Comment #7 from FH fh_p at hotmail dot com 2012-05-09 14:36:00 UTC --- Well... Still don't really get why it is not possible to improve performance for such basic things. I tried with allocations up to 7 Gb or more (RAM full + SWAP full) : I still get the same result that looks unexpected to me ?! Anyway, I guess I won't be able to get the logic of this...
[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292 --- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org 2012-05-09 15:01:24 UTC --- Just try equivalent pthread program and you'll note the same behavior. #include pthread.h #include stdlib.h double *p; int c; void *tf (void *x) { int i, s = ((long) x) * c, e = s + c; for (i = s; i e; i++) p[i] = 1.0; return NULL; } int main (int argc, char **argv) { int n = atoi (argv[1]), i; int sz = atoi (argv[2]); if (n 32 || n 1 || sz 128 || (sz % n) != 0) return 1; p = malloc (sz * sizeof (double)); if (p == NULL) return 1; c = sz / n; pthread_t t[32]; for (i = 1; i n; i++) pthread_create (t[i], NULL, tf, (void *)(long) i); tf ((void *) 0L); for (i = 1; i n; i++) pthread_join (t[i], NULL); return 0; }