[OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-18 Thread Brock Palen
I have started using 1.8.1 for some codes (meep in this case) and it sometimes 
works fine, but in a few cases I am seeing ranks being given overlapping CPU 
assignments, not always though.

Example job, default binding options (so by-core right?):

Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and use 
TM to spawn.

[nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
[nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
[nyx5409:11][nyx5411:11][nyx5412:3]

[root@nyx5398 ~]# hwloc-bind --get --pid 16065
0x0200
[root@nyx5398 ~]# hwloc-bind --get --pid 16066
0x0800
[root@nyx5398 ~]# hwloc-bind --get --pid 16067
0x0200
[root@nyx5398 ~]# hwloc-bind --get --pid 16068
0x0800
  
[root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus 
8-11

So torque claims the CPU set setup for the job has 4 cores, but as you can see 
the ranks were giving identical binding. 

I checked the pids they were part of the correct CPU set, I also checked, orted:

[root@nyx5398 ~]# hwloc-bind --get --pid 16064
0x0f00
[root@nyx5398 ~]# hwloc-calc --intersect PU 16064
ignored unrecognized argument 16064

[root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00
8,9,10,11

Which is exactly what I would expect.

So ummm, i'm lost why this might happen?  What else should I check?  Like I 
said not all jobs show this behavior.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





signature.asc
Description: Message signed with OpenPGP using GPGMail


[OMPI users] make check with external libltdl

2014-06-18 Thread Pascal Paschos
make check fails when the 1.8.1 is built with an external libltdl. The system 
libtool did not have those headers so we provided an external installation of 
libtool. The source compiles and builds but fails this test:
make[3]: *** No rule to make target `../../opal/libltdl/libltdlc.la', needed by 
`dlopen_test'.  Stop.

When building with  --disable-dlopen which disables libltdl anyway or the 
internal make check returns no errors. Just curious for an explanation... if 
there is any.