On Fri, Jul 19, 2013 at 6:59 PM, gigo <g...@ibb.waw.pl> wrote: > Hi! > > > On 2013-07-17 21:08, Mark Abraham wrote: >> >> You tried ppn3 (with and without --loadbalance)? > > > I was testing on 8-replicas simulation. > > 1) Without --loadbalance and -np 8. > Excerpts from the script: > #PBS -l nodes=8:ppn=3 > setenv OMP_NUM_THREADS 4 > mpiexec mdrun_mpi -v -cpt 20 -multi 8 -ntomp 4 -replex 2500 -cpi -pin on > > Excerpts from logs: > Using 3 MPI processes > Using 4 OpenMP threads per MPI process > (...) > Overriding thread affinity set outside mdrun_mpi > > Pinning threads with an auto-selected logical core stride of 1 > > WARNING: In MPI process #0: Affinity setting for 1/4 threads failed. > This can cause performance degradation! If you think your setting > are > correct, contact the GROMACS developers. > > > WARNING: In MPI process #2: Affinity setting for 4/4 threads failed. > > Load: The job was allocated 24 cores (3 cores on 8 different nodes). Each > OpenMP thread uses ~1/3 of a CPU core on average. > Conclusions: MPI runs as many processes as cores requested (nnodes*ppn=24), > it ignores OMP_NUM_THREADS env ==> this is wrong and is not Gromacs issue. > Each MPI process forks to 4 threads as requested. The 24-core limit granted > by Torque is not violated. > > 2) The same script, but with -np 8, to limit the number of MPI processes to > the number of replicas > > Logs: > Using 1 MPI process > Using 4 OpenMP threads > (...) > > Replicas 0,3 and 6: WARNING: Affinity setting for 1/4 threads failed. > Replicas 1,2,4,5,7: WARNING: Affinity setting for 4/4 threads failed. > > > Load: The job was allocated 24 cores on 8 nodes. Only on first 3 nodes > mpiexec was run. Each OpenMP thread uses ~20% of a CPU core. > > 3) -np 8 --loadbalance > Excerpts from logs: > > Using 1 MPI process > Using 4 OpenMP threads > (...) > Each replica says: WARNING: Affinity setting for 3/4 threads failed. > > Load: MPI processes spread evenly on all 8 nodes. Each OpenMP thread uses > ~50% of a CPU core. > > 4) -np 8 --loadbalance, #PBS -l nodes=8:ppn=4 <== this worked ~OK with > gromacs 4.6.2 > Logs: > WARNING: Affinity setting for 2/4 threads failed. > > Load: 32 cores allocated on 8 nodes. MPI processes spread evenly, each > OpenMP thread uses ~70% of a CPU core. > With 144 replicas the simulation did not produce any results, just got > stuck. > > > Some thoughts: the main problem is most probably in the way MPI interprets > the information from torque, it is not Gromacs related. MPI ignores > OMP_NUM_THREADS. The environment is just broken. Since gromacs-4.6.2 behaved > better than 4.6.3 there, I am coming back to it.
FYI: unless you are setting thread affinities manually/through the job scheduler, as the mdrun internal affinity setting has a bug in 4.6.2, you are advised to use 4.6.3 (and the "better" behavior may actually be caused by the non-functional affinity setting). > Best, > > G > >> >> Mark >> >> On Wed, Jul 17, 2013 at 6:30 PM, gigo <g...@ibb.waw.pl> wrote: >>> >>> On 2013-07-13 11:10, Mark Abraham wrote: >>>> >>>> >>>> On Sat, Jul 13, 2013 at 1:24 AM, gigo <g...@ibb.waw.pl> wrote: >>>>> >>>>> >>>>> On 2013-07-12 20:00, Mark Abraham wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Jul 12, 2013 at 4:27 PM, gigo <g...@ibb.waw.pl> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hi! >>>>>>> >>>>>>> On 2013-07-12 11:15, Mark Abraham wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> What does --loadbalance do? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> It balances the total number of processes across all allocated nodes. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> OK, but using it means you are hostage to its assumptions about >>>>>> balance. >>>>> >>>>> >>>>> >>>>> >>>>> Thats true, but as long as I do not try to use more resources that the >>>>> torque gives me, everything is OK. The question is, what is a proper >>>>> way >>>>> of >>>>> running multiple simulations in parallel with MPI that are further >>>>> parallelized with OpenMP, when pinning fails? I could not find any >>>>> other. >>>> >>>> >>>> >>>> I think pinning fails because you are double-crossing yourself. You do >>>> not want 12 MPI processes per node, and that is likely what ppn is >>>> setting. AFAIK your setup should work, but I haven't tested it. >>>> >>>>>> >>>>>>> The >>>>>>> thing is that mpiexec does not know that I want each replica to fork >>>>>>> to >>>>>>> 4 >>>>>>> OpenMP threads. Thus, without this option and without affinities (in >>>>>>> a >>>>>>> sec >>>>>>> about it) mpiexec starts too many replicas on some nodes - gromacs >>>>>>> complains >>>>>>> about the overload then - while some cores on other nodes are not >>>>>>> used. >>>>>>> It >>>>>>> is possible to run my simulation like that: >>>>>>> >>>>>>> mpiexec mdrun_mpi -v -cpt 20 -multi 144 -replex 2000 -cpi (without >>>>>>> --loadbalance for mpiexec and without -ntomp for mdrun) >>>>>>> >>>>>>> Then each replica runs on 4 MPI processes (I allocate 4 times more >>>>>>> cores >>>>>>> then replicas and mdrun sees it). The problem is that it is much >>>>>>> slower >>>>>>> than >>>>>>> using OpenMP for each replica. I did not find any other way than >>>>>>> --loadbalance in mpiexec and then -multi 144 -ntomp 4 in mdrun to use >>>>>>> MPI >>>>>>> and OpenMP at the same time on the torque-controlled cluster. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> That seems highly surprising. I have not yet encountered a job >>>>>> scheduler that was completely lacking a "do what I tell you" layout >>>>>> scheme. More importantly, why are you using #PBS -l nodes=48:ppn=12? >>>>> >>>>> >>>>> >>>>> >>>>> I thing that torque is very similar to all PBS-like resource managers >>>>> in >>>>> this regard. It actually does what I tell it to do. There are 12-core >>>>> nodes, >>>>> I ask for 48 of them - I get them (simple #PBS -l ncpus=576 does not >>>>> work), >>>>> end of story. Now, the program that I run is responsible for populating >>>>> resources that I got. >>>> >>>> >>>> >>>> No, that's not the end of the story. The scheduler and the MPI system >>>> typically cooperate to populate the MPI processes on the hardware, set >>>> OMP_NUM_THREADS, set affinities, etc. mdrun honours those if they are >>>> set. >>> >>> >>> >>> I was able to run what I wanted flawlessly on another cluster with >>> PBS-Pro. >>> The torque cluster seem to work like I said ("the end of story" >>> behaviour). >>> REMD runs well on torque when I give a whole physical node to one >>> replica. >>> Otherwise the simulation does not go or the pinning fails (sometimes >>> partially). I run out of options, I did not find any working >>> example/documentation on running hybrid MPI/OpenMP jobs in torque. It >>> seems >>> that I stumbled upon limitations of this resource manager, and it is not >>> really the Gromacs issue. >>> Best Regards, >>> Grzegorz >>> >>> >>>> >>>> You seem to be using 12 because you know there are 12 cores per node. >>>> The scheduler should know that already. ppn should be a command about >>>> what to do with the hardware, not a description of what it is. More to >>>> the point, you should read the docs and be sure what it does. >>>> >>>>>> Surely you want 3 MPI processes per 12-core node? >>>>> >>>>> >>>>> >>>>> >>>>> Yes - I want each node to run 3 MPI processes. Preferably, I would like >>>>> to >>>>> run each MPI process on separate node (spread on 12 cores with OpenMP) >>>>> but I >>>>> will not get as much of resources. But again, without the --loadbalance >>>>> hack >>>>> I would not be able to properly populate the nodes... >>>> >>>> >>>> >>>> So try ppn 3! >>>> >>>>>> >>>>>>>> What do the .log files say about >>>>>>>> OMP_NUM_THREADS, thread affinities, pinning, etc? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Each replica logs: >>>>>>> "Using 1 MPI process >>>>>>> Using 4 OpenMP threads", >>>>>>> That is is correct. As I said, the threads are forked, but 3 out of 4 >>>>>>> don't >>>>>>> do anything, and the simulation does not go at all. >>>>>>> >>>>>>> About affinities Gromacs says: >>>>>>> "Can not set thread affinities on the current platform. On NUMA >>>>>>> systems >>>>>>> this >>>>>>> can cause performance degradation. If you think your platform should >>>>>>> support >>>>>>> setting affinities, contact the GROMACS developers." >>>>>>> >>>>>>> Well, the "current platform" is normal x86_64 cluster, but the whole >>>>>>> information about resources is passed by Torque to OpenMPI-linked >>>>>>> Gromacs. >>>>>>> Can it be that mdrun sees the resources allocated by torque as a big >>>>>>> pool >>>>>>> of >>>>>>> cpus and misses the information about nodes topology? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> mdrun gets its processor topology from the MPI layer, so that is where >>>>>> you need to focus. The error message confirms that GROMACS sees things >>>>>> that seem wrong. >>>>> >>>>> >>>>> >>>>> >>>>> Thank you, I will take a look. But the first thing I want to do is >>>>> finding >>>>> the reason why Gromacs 4.6.3 is not able to run on my (slightly weird, >>>>> I >>>>> admit) setup, while 4.6.2 does it very well. >>>> >>>> >>>> >>>> 4.6.2 had a bug that inhibited any MPI-based mdrun from attempting to >>>> set affinities. It's still not clear why ppn 12 worked at all. >>>> Apparently mdrun was able to float some processes around to get >>>> something that worked. The good news is that when you get it working >>>> in 4.6.3, you will see a performance boost. >>>> >>>> Mark >>> >>> >>> -- >>> gmx-users mailing list gmx-users@gromacs.org >>> http://lists.gromacs.org/mailman/listinfo/gmx-users >>> * Please search the archive at >>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >>> * Please don't post (un)subscribe requests to the list. Use the www >>> interface or send it to gmx-users-requ...@gromacs.org. >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the www > interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists