Just to wrap up this thread, it does work when the mpirun is properly configured. I knew it had to be my fault :)
Something like this works like a charm: mpirun -npernode 2 mdrun_mpi -ntomp 8 -gpu_id 01 -deffnm md -v Thank you Mark and Szilárd for your invaluable expertise. Best regards, João Henriques On Wed, Jun 5, 2013 at 4:21 PM, João Henriques < joao.henriques.32...@gmail.com> wrote: > Ok, thanks once again. I will do my best to overcome this issue. > > Best regards, > João Henriques > > > On Wed, Jun 5, 2013 at 3:33 PM, Mark Abraham <mark.j.abra...@gmail.com>wrote: > >> On Wed, Jun 5, 2013 at 2:53 PM, João Henriques < >> joao.henriques.32...@gmail.com> wrote: >> >> > Sorry to keep bugging you guys, but even after considering all you >> > suggested and reading the bugzilla thread Mark pointed out, I'm still >> > unable to make the simulation run over multiple nodes. >> > *Here is a template of a simple submission over 2 nodes:* >> > >> > --- START --- >> > #!/bin/sh >> > # >> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> > # >> > # Job name >> > #SBATCH -J md >> > # >> > # No. of nodes and no. of processors per node >> > #SBATCH -N 2 >> > #SBATCH --exclusive >> > # >> > # Time needed to complete the job >> > #SBATCH -t 48:00:00 >> > # >> > # Add modules >> > module load gcc/4.6.3 >> > module load openmpi/1.6.3/gcc/4.6.3 >> > module load cuda/5.0 >> > module load gromacs/4.6 >> > # >> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> > # >> > grompp -f md.mdp -c npt.gro -t npt.cpt -p topol -o md.tpr >> > mpirun -np 4 mdrun_mpi -gpu_id 01 -deffnm md -v >> > # >> > # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> > --- END --- >> > >> > *Here is an extract of the md.log:* >> > >> > --- START --- >> > Using 4 MPI processes >> > Using 4 OpenMP threads per MPI process >> > >> > Detecting CPU-specific acceleration. >> > Present hardware specification: >> > Vendor: GenuineIntel >> > Brand: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz >> > Family: 6 Model: 45 Stepping: 7 >> > Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr >> nonstop_tsc >> > pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 >> ssse3 >> > tdt x2apic >> > Acceleration most likely to fit this hardware: AVX_256 >> > Acceleration selected at GROMACS compile time: AVX_256 >> > >> > >> > 2 GPUs detected on host en001: >> > #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible >> > #1: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible >> > >> > >> > ------------------------------------------------------- >> > Program mdrun_mpi, VERSION 4.6 >> > Source code file: >> > >> /lunarc/sw/erik/src/gromacs/gromacs-4.6/src/gmxlib/gmx_detect_hardware.c, >> > line: 322 >> > >> > Fatal error: >> > Incorrect launch configuration: mismatching number of PP MPI processes >> and >> > GPUs per node. >> > >> >> "per node" is critical here. >> >> >> > mdrun_mpi was started with 4 PP MPI processes per node, but you >> provided 2 >> > GPUs. >> > >> >> ...and here. As far as mdrun_mpi knows from the MPI system there's only >> MPI >> ranks on this one node. >> >> For more information and tips for troubleshooting, please check the >> GROMACS >> > website at http://www.gromacs.org/Documentation/Errors >> > ------------------------------------------------------- >> > --- END --- >> > >> > As you can see, gmx is having trouble understanding that there's a >> second >> > node available. Note that since I did not specify -ntomp, it assigned 4 >> > threads to each of the 4 mpi processes (filling the entire avail. 16 >> CPUs >> > *on >> > one node*). >> > For the same exact submission, if I do set "-ntomp 8" (since I have 4 >> MPI >> > procs * 8 OpenMP threads = 32 CPUs total on the 2 nodes) I get a warning >> > telling me that I'm hyperthreading, which can only mean that *gmx is >> > assigning all processes to the first node once again.* >> > Am I doing something wrong or is there some problem with gmx-4.6? I >> guess >> > it can only be my fault, since I've never seen anyone else complaining >> > about the same issue here. >> > >> >> Assigning MPI processes to nodes is a matter configuring your MPI. GROMACS >> just follows the MPI system information it gets from MPI - hence the >> oversubscription. If you assign two MPI processes to each node, then >> things >> should work. >> >> Mark >> -- >> gmx-users mailing list gmx-users@gromacs.org >> http://lists.gromacs.org/mailman/listinfo/gmx-users >> * Please search the archive at >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >> * Please don't post (un)subscribe requests to the list. Use the >> www interface or send it to gmx-users-requ...@gromacs.org. >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> > > > > -- > João Henriques > -- João Henriques -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists