On Fri, May 8, 2015 at 4:45 PM, Malcolm Tobias <mtob...@wustl.edu> wrote: > > Szilárd, > > On Friday 08 May 2015 15:56:12 Szilárd Páll wrote: >> What's being utilized vs what's being started are different things. If >> you don't believe the mdrun output - which is quite likely not wrong >> about the 2 ranks x 4 threads -, use your favorite tool to check the >> number of ranks and threads started and their placement. That will >> explain what's going on... > > Good point. If I use 'ps -L' I can see the OpenMP threads: > > [root@gpu21 ~]# ps -Lfu mtobias > UID PID PPID LWP C NLWP STIME TTY TIME CMD > mtobias 9830 9828 9830 0 1 09:28 ? 00:00:00 sshd: mtobias@pts/0 > mtobias 9831 9830 9831 0 1 09:28 pts/0 00:00:00 -bash > mtobias 9989 9831 9989 0 2 09:33 pts/0 00:00:00 mpirun -np 2 > mdrun_mp > mtobias 9989 9831 9991 0 2 09:33 pts/0 00:00:00 mpirun -np 2 > mdrun_mp > mtobias 9990 9831 9990 0 1 09:33 pts/0 00:00:00 tee mdrun.out > mtobias 9992 9989 9992 38 7 09:33 pts/0 00:00:02 mdrun_mpi -ntomp 4 > -v > mtobias 9992 9989 9994 0 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 > -v > mtobias 9992 9989 9998 0 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 > -v > mtobias 9992 9989 10000 0 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 > -v > mtobias 9992 9989 10001 16 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 > -v > mtobias 9992 9989 10002 16 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 > -v > mtobias 9992 9989 10003 16 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 > -v > mtobias 9993 9989 9993 73 7 09:33 pts/0 00:00:05 mdrun_mpi -ntomp 4 > -v > mtobias 9993 9989 9995 0 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 > -v > mtobias 9993 9989 9999 0 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 > -v > mtobias 9993 9989 10004 0 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 > -v > mtobias 9993 9989 10005 12 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 > -v > mtobias 9993 9989 10006 12 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 > -v > mtobias 9993 9989 10007 12 7 09:33 pts/0 00:00:00 mdrun_mpi -ntomp 4 > -v > > but top only shows 2 CPUs being utilized: > > top - 09:33:42 up 37 days, 19:48, 2 users, load average: 2.13, 1.05, 0.68 > Tasks: 517 total, 3 running, 514 sleeping, 0 stopped, 0 zombie > Cpu0 : 98.7%us, 1.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu1 : 98.7%us, 1.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu2 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu3 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu6 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu8 : 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st > Cpu9 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu10 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu11 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu12 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu13 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu14 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu15 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Mem: 132053748k total, 23817664k used, 108236084k free, 268628k buffers > Swap: 4095996k total, 1884k used, 4094112k free, 15600572k cached > > >> Very likely that's exactly what's screwing things up. We try to be >> nice and back off (mdrun should note that on the output) when >> affinities are set externally assuming that they are set for a good >> reason and to correct values. Sadly, that assumption often proves to >> be wrong. Try running with "-pin on" or turn off the CPUSET-ing (or >> double-check if it's right). > > I wouldn't expect the CPUSETs to be problematic, I've been using them with > Gromacs for over a decade now ;-)
Thread affinity setting within mdrun has been employed since v4.6 and we do it on a per-thread basis and not doing it can leadto pretty severe performance degradation when using multi-threading. Depending on the Linux kernel, OS jitter, and type/speed/scale of the simulation even MPI-only runs will see a benefit from correct affinity settings. Hints: - some useful mdrun command line arguments: "-pin on", "-pinoffset N" (-pinstride N) - more details: http://www.gromacs.org/Documentation/Acceleration_and_parallelization > If I use '-pin on' it appears to be utilizing 8 CPU-cores as expected: > > [mtobias@gpu21 Gromacs_Test]$ mpirun -np 2 mdrun_mpi -ntomp 4 -pin on -v > -deffnm PolyA_Heli_J_hi_equil > > top - 09:36:26 up 37 days, 19:50, 2 users, load average: 1.00, 1.14, 0.78 > Tasks: 516 total, 4 running, 512 sleeping, 0 stopped, 0 zombie > Cpu0 : 78.9%us, 2.7%sy, 0.0%ni, 18.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu1 : 63.7%us, 0.3%sy, 0.0%ni, 36.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu2 : 65.6%us, 0.3%sy, 0.0%ni, 33.8%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st > Cpu3 : 64.9%us, 0.3%sy, 0.0%ni, 34.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu4 : 80.7%us, 2.7%sy, 0.0%ni, 16.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu5 : 64.0%us, 0.3%sy, 0.0%ni, 35.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu6 : 62.0%us, 0.3%sy, 0.0%ni, 37.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu7 : 60.3%us, 0.3%sy, 0.0%ni, 39.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st > > > Weird. I wonder if anyone else has experience using pin'ing with CPUSETs? What is your goal with using CPUSETs? Node sharing? -- Szilárd > Malcolm > > -- > Malcolm Tobias > 314.362.1594 > > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a > mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.