Chis, If you assume your Cpusets are correct, and you are not doing any hybrid thread+mpi I found the problem is avoided if you enable -bind-to-core with openmpi 1.6.x
We just don't enable binding by default on our setup and thus far no users have been bit by this. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Nov 5, 2012, at 9:00 PM, Christopher Samuel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 06/11/12 08:57, Brock Palen wrote: > >> Ok more information (had to build newer hwloc) My job today only >> 2 processes are running at half speed and they indeed are sharing >> the same core: > > We've seen the same occasionally using CentOS5/RHEL5 with jobs running > under Torque with cpusets enabled. > > Never been able to explain it and the most recent case was someone > using a home compiled version of NAMD, the problem disappeared when > they started using our provided builds. > > I was fixing up the running problem jobs by hand by assigning procs to > individual cores on the nodes with cpusets. :-/ > > cheers, > Chris > - -- > Christopher Samuel Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ > > iEYEARECAAYFAlCYb1sACgkQO2KABBYQAh/OGACeNL7bow7z26El31zIg16q+tPw > toIAnigL5SHhZXM42DGY3M2Ewt6PUNIk > =/bNA > -----END PGP SIGNATURE----- > _______________________________________________ > hwloc-users mailing list > hwloc-us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users