Chis,

If you assume your Cpusets are correct, and you are not doing any hybrid 
thread+mpi I found the problem is avoided if you enable -bind-to-core with 
openmpi 1.6.x  

We just don't enable binding by default on our setup and thus far no users have 
been bit by this. 

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Nov 5, 2012, at 9:00 PM, Christopher Samuel wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 06/11/12 08:57, Brock Palen wrote:
> 
>> Ok more information (had to build newer hwloc)  My job today only
>> 2 processes are running at half speed and they indeed are sharing
>> the same core:
> 
> We've seen the same occasionally using CentOS5/RHEL5 with jobs running
> under Torque with cpusets enabled.
> 
> Never been able to explain it and the most recent case was someone
> using a home compiled version of NAMD, the problem disappeared when
> they started using our provided builds.
> 
> I was fixing up the running problem jobs by hand by assigning procs to
> individual cores on the nodes with cpusets.  :-/
> 
> cheers,
> Chris
> - -- 
> Christopher Samuel        Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/      http://twitter.com/vlsci
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
> 
> iEYEARECAAYFAlCYb1sACgkQO2KABBYQAh/OGACeNL7bow7z26El31zIg16q+tPw
> toIAnigL5SHhZXM42DGY3M2Ewt6PUNIk
> =/bNA
> -----END PGP SIGNATURE-----
> _______________________________________________
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users


Reply via email to