------- Comment #5 from jakub at gcc dot gnu dot org 2010-04-20 10:23 ------- For performance reasons libgomp uses some busy waiting, which of course works well when there are available CPUs and cycles to burn (decreases latency a lot), but if you have more threads than CPUs it can make things worse. You can tweak this through OMP_WAIT_POLICY and GOMP_SPINCOUNT env vars. Although the implementation recognizes two kinds of spin counts (normal and throttled, the latter in use when number of threads is bigger than number of available CPUs), in some cases even that default might be too large (the default for throttled spin count is 1000 spins for OMP_WAIT_POLICY=active and 100 spins for no OMP_WAIT_POLICY in environment).
-- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706