[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded

2012-05-12 Thread fh_p at hotmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292

--- Comment #9 from FH fh_p at hotmail dot com 2012-05-12 21:27:31 UTC ---
Well...

I tested an OpenMP benchmarch (design to demonstrate OpenMP performances) found
on the web : multi-threaded (OpenMP) is again slower than single-threaded. I
looked at coding with pthreads : same thing.

So, I have a dual-core hyper-threaded PC : I end up with multi-threaded
applications slower than single-threaded and this is supposed to be a normal
behavior ?!... Anyway this is still illogical to me ?!?!


[Bug c++/53292] New: multi-threaded (OpenMP) is slower than single-threaded

2012-05-09 Thread fh_p at hotmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292

 Bug #: 53292
   Summary: multi-threaded (OpenMP) is slower than single-threaded
Classification: Unclassified
   Product: gcc
   Version: 4.6.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: f...@hotmail.com


Created attachment 27352
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27352
cpp file to compile

Hello,

The problem is : multi-threaded (OpenMP) is slower than single-threaded

Test has been ran several times successively.
ChunkSize imposed to avoid false sharing.
Test ran on Ubuntu 12.04 64 bits / 4Gb RAM / CPU i3 (dual core hyper-threaded :
2 physical cores + 2 logical cores) / gcc version : 4.6.2.

Is the problem related to CPU ? i3 hyper-threaded would not be handled
correctly by g++ or Ubuntu ?

Can somebody help or give me clue ?

Thanks,

FH

PS : I am not used to post bug, if this bug has not been posted in the right
place feel free to tell me where to post it


[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded

2012-05-09 Thread fh_p at hotmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292

--- Comment #1 from FH fh_p at hotmail dot com 2012-05-09 09:17:29 UTC ---
I am not sure to know if this problem is related rather to gcc or rather to
Ubuntu. I started with the assumption that is should rather to related to
gcc.


[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded

2012-05-09 Thread fh_p at hotmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292

--- Comment #2 from FH fh_p at hotmail dot com 2012-05-09 10:16:52 UTC ---
I have just tested on another computer (CPU : Xeon5650 12 cores + OS :
Scientific Linux) = I reproduce the unexpected behavior (OpenMP slower than
single-threaded).

So, I believe the problem is rather related to gcc (than to the OS)

When I use more threads (export OMP_NUM_THREADS=2, then 6, then 12), OpenMP is
more slower than single-threaded. (behavior related to thread initialisation
?)


[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded

2012-05-09 Thread fh_p at hotmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292

--- Comment #4 from FH fh_p at hotmail dot com 2012-05-09 12:53:46 UTC ---
I don't understand your answer.

Timing just times the for loop. Checking array content is single threaded :
this is added to make sure the for loop has done the job correctly and this
check is not timed. The array to initialize is shared by threads (shared by
default) ans not private to each thread.

To me, the test seems relevant. If it's not, why ? And how to modify it ?


[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded

2012-05-09 Thread fh_p at hotmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292

FH fh_p at hotmail dot com changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |

--- Comment #5 from FH fh_p at hotmail dot com 2012-05-09 12:55:56 UTC ---
To me, the test seems relevant. If it's not, why ? And how to modify it ?


[Bug c++/53292] multi-threaded (OpenMP) is slower than single-threaded

2012-05-09 Thread fh_p at hotmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53292

--- Comment #7 from FH fh_p at hotmail dot com 2012-05-09 14:36:00 UTC ---
Well...

Still don't really get why it is not possible to improve performance for such
basic things. I tried with allocations up to 7 Gb or more (RAM full + SWAP
full) : I still get the same result that looks unexpected to me ?!

Anyway, I guess I won't be able to get the logic of this...