Re: [gmx-users] Running gmx-4.6.x over multiple homogeneous nodes with GPU acceleration
On Wed, Jun 5, 2013 at 4:35 PM, João Henriques joao.henriques.32...@gmail.com wrote: Just to wrap up this thread, it does work when the mpirun is properly configured. I knew it had to be my fault :) Something like this works like a charm: mpirun -npernode 2 mdrun_mpi -ntomp 8 -gpu_id 01 -deffnm md -v That is indeed the correct way to launch the simulation, this way you'll have two ranks per node, each using a different GPU. However, coming back to your initial (non-working) launch config, if you want to run 4 x 4 threads per rank you'll have to assign two for each GPU: mpirun -np 4 mdrun_mpi -gpu_id 00111 -deffnm md -v If scaling in the multi-threading is limiting performance, the above will help (compared to 2 x 8s thread per rank) - which is often the case on AMD and I've seen cases where it did help already on a single Intel node. I'd like to point out one more thing which is important when you run on more than just a node or two. GPU accelerated runs don't switch to using separate PME ranks - mostly because it's very hard to pick the settings for distributing cores between PP and PME ranks. However, already from around two-three nodes, you will get better performance by using separate PME ranks. You should experiment with using part of the cores (usually half is a decent choice) for PME either by running 2 PP + 1 PME or 2 PP and 2 PME. Thank you Mark and Szilárd for your invaluable expertise. Welcome! -- Szilárd Best regards, João Henriques On Wed, Jun 5, 2013 at 4:21 PM, João Henriques joao.henriques.32...@gmail.com wrote: Ok, thanks once again. I will do my best to overcome this issue. Best regards, João Henriques On Wed, Jun 5, 2013 at 3:33 PM, Mark Abraham mark.j.abra...@gmail.comwrote: On Wed, Jun 5, 2013 at 2:53 PM, João Henriques joao.henriques.32...@gmail.com wrote: Sorry to keep bugging you guys, but even after considering all you suggested and reading the bugzilla thread Mark pointed out, I'm still unable to make the simulation run over multiple nodes. *Here is a template of a simple submission over 2 nodes:* --- START --- #!/bin/sh # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # # Job name #SBATCH -J md # # No. of nodes and no. of processors per node #SBATCH -N 2 #SBATCH --exclusive # # Time needed to complete the job #SBATCH -t 48:00:00 # # Add modules module load gcc/4.6.3 module load openmpi/1.6.3/gcc/4.6.3 module load cuda/5.0 module load gromacs/4.6 # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # grompp -f md.mdp -c npt.gro -t npt.cpt -p topol -o md.tpr mpirun -np 4 mdrun_mpi -gpu_id 01 -deffnm md -v # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --- END --- *Here is an extract of the md.log:* --- START --- Using 4 MPI processes Using 4 OpenMP threads per MPI process Detecting CPU-specific acceleration. Present hardware specification: Vendor: GenuineIntel Brand: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz Family: 6 Model: 45 Stepping: 7 Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic Acceleration most likely to fit this hardware: AVX_256 Acceleration selected at GROMACS compile time: AVX_256 2 GPUs detected on host en001: #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible #1: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible --- Program mdrun_mpi, VERSION 4.6 Source code file: /lunarc/sw/erik/src/gromacs/gromacs-4.6/src/gmxlib/gmx_detect_hardware.c, line: 322 Fatal error: Incorrect launch configuration: mismatching number of PP MPI processes and GPUs per node. per node is critical here. mdrun_mpi was started with 4 PP MPI processes per node, but you provided 2 GPUs. ...and here. As far as mdrun_mpi knows from the MPI system there's only MPI ranks on this one node. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- --- END --- As you can see, gmx is having trouble understanding that there's a second node available. Note that since I did not specify -ntomp, it assigned 4 threads to each of the 4 mpi processes (filling the entire avail. 16 CPUs *on one node*). For the same exact submission, if I do set -ntomp 8 (since I have 4 MPI procs * 8 OpenMP threads = 32 CPUs total on the 2 nodes) I get a warning telling me that I'm hyperthreading, which can only mean that *gmx is assigning all processes to the first node once again.* Am I doing something wrong or is there some problem with gmx-4.6? I guess it can only be my fault, since I've never seen anyone
Re: [gmx-users] Running gmx-4.6.x over multiple homogeneous nodes with GPU acceleration
Thank you very much for both contributions. I will conduct some tests to assess which approach works best for my system. Much appreciated, Best regards, João Henriques On Tue, Jun 4, 2013 at 6:30 PM, Szilárd Páll szilard.p...@cbr.su.se wrote: mdrun is not blind, just the current design does report the hardware of all compute nodes used. Whatever CPU/GPU hardware mdrun reports in the log/std output is *only* what rank 0, i.e. the first MPI process, detects. If you have a heterogeneous hardware configuration, in most cases you should be able to run just fine, but you'll still get only the hardware the first rank sits on reported. Hence, if you want to run on 5 of the nodes you mention, you just do: mpirun -np 10 mdrun_mpi [-gpu_id 01] You may want to try both -ntomp 8 and -ntomp 16 (using HyperThreading does not always help). Also note that if you use GPU sharing among ranks (in order to use 8 threads/rank), (for some technical reasons) disabling dynamic load balancing may help - especially if you have a homogenous simulation system (and hardware setup). Cheers, -- Szilárd On Tue, Jun 4, 2013 at 3:31 PM, João Henriques joao.henriques.32...@gmail.com wrote: Dear all, Since gmx-4.6 came out, I've been particularly interested in taking advantage of the native GPU acceleration for my simulations. Luckily, I have access to a cluster with the following specs PER NODE: CPU 2 E5-2650 (2.0 Ghz, 8-core) GPU 2 Nvidia K20 I've become quite familiar with the heterogenous parallelization and multiple MPI ranks per GPU schemes on a SINGLE NODE. Everything works fine, no problems at all. Currently, I'm working with a nasty system comprising 608159 tip3p water molecules and it would really help to accelerate things up a bit. Therefore, I would really like to try to parallelize my system over multiple nodes and keep the GPU acceleration. I've tried many different command combinations, but mdrun seems to be blind towards the GPUs existing on other nodes. It always finds GPUs #0 and #1 on the first node and tries to fit everything into these, completely disregarding the existence of the other GPUs on the remaining requested nodes. Once again, note that all nodes have exactly the same specs. Literature on the official gmx website is not, well... you know... in-depth and I would really appreciate if someone could shed some light into this subject. Thank you, Best regards, -- João Henriques -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- João Henriques -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Running gmx-4.6.x over multiple homogeneous nodes with GPU acceleration
Sorry to keep bugging you guys, but even after considering all you suggested and reading the bugzilla thread Mark pointed out, I'm still unable to make the simulation run over multiple nodes. *Here is a template of a simple submission over 2 nodes:* --- START --- #!/bin/sh # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # # Job name #SBATCH -J md # # No. of nodes and no. of processors per node #SBATCH -N 2 #SBATCH --exclusive # # Time needed to complete the job #SBATCH -t 48:00:00 # # Add modules module load gcc/4.6.3 module load openmpi/1.6.3/gcc/4.6.3 module load cuda/5.0 module load gromacs/4.6 # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # grompp -f md.mdp -c npt.gro -t npt.cpt -p topol -o md.tpr mpirun -np 4 mdrun_mpi -gpu_id 01 -deffnm md -v # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --- END --- *Here is an extract of the md.log:* --- START --- Using 4 MPI processes Using 4 OpenMP threads per MPI process Detecting CPU-specific acceleration. Present hardware specification: Vendor: GenuineIntel Brand: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz Family: 6 Model: 45 Stepping: 7 Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic Acceleration most likely to fit this hardware: AVX_256 Acceleration selected at GROMACS compile time: AVX_256 2 GPUs detected on host en001: #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible #1: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible --- Program mdrun_mpi, VERSION 4.6 Source code file: /lunarc/sw/erik/src/gromacs/gromacs-4.6/src/gmxlib/gmx_detect_hardware.c, line: 322 Fatal error: Incorrect launch configuration: mismatching number of PP MPI processes and GPUs per node. mdrun_mpi was started with 4 PP MPI processes per node, but you provided 2 GPUs. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- --- END --- As you can see, gmx is having trouble understanding that there's a second node available. Note that since I did not specify -ntomp, it assigned 4 threads to each of the 4 mpi processes (filling the entire avail. 16 CPUs *on one node*). For the same exact submission, if I do set -ntomp 8 (since I have 4 MPI procs * 8 OpenMP threads = 32 CPUs total on the 2 nodes) I get a warning telling me that I'm hyperthreading, which can only mean that *gmx is assigning all processes to the first node once again.* Am I doing something wrong or is there some problem with gmx-4.6? I guess it can only be my fault, since I've never seen anyone else complaining about the same issue here. *Here are the cluter specs details:* http://www.lunarc.lu.se/Systems/ErikDetails Thank you for your patience and expertise, Best regards, João Henriques On Tue, Jun 4, 2013 at 6:30 PM, Szilárd Páll szilard.p...@cbr.su.se wrote: mdrun is not blind, just the current design does report the hardware of all compute nodes used. Whatever CPU/GPU hardware mdrun reports in the log/std output is *only* what rank 0, i.e. the first MPI process, detects. If you have a heterogeneous hardware configuration, in most cases you should be able to run just fine, but you'll still get only the hardware the first rank sits on reported. Hence, if you want to run on 5 of the nodes you mention, you just do: mpirun -np 10 mdrun_mpi [-gpu_id 01] You may want to try both -ntomp 8 and -ntomp 16 (using HyperThreading does not always help). Also note that if you use GPU sharing among ranks (in order to use 8 threads/rank), (for some technical reasons) disabling dynamic load balancing may help - especially if you have a homogenous simulation system (and hardware setup). Cheers, -- Szilárd On Tue, Jun 4, 2013 at 3:31 PM, João Henriques joao.henriques.32...@gmail.com wrote: Dear all, Since gmx-4.6 came out, I've been particularly interested in taking advantage of the native GPU acceleration for my simulations. Luckily, I have access to a cluster with the following specs PER NODE: CPU 2 E5-2650 (2.0 Ghz, 8-core) GPU 2 Nvidia K20 I've become quite familiar with the heterogenous parallelization and multiple MPI ranks per GPU schemes on a SINGLE NODE. Everything works fine, no problems at all. Currently, I'm working with a nasty system comprising 608159 tip3p water molecules and it would really help to accelerate things up a bit. Therefore, I would really like to try to parallelize my system over multiple nodes and keep the GPU acceleration. I've tried many different command combinations, but mdrun seems to be blind towards the GPUs existing on other nodes. It always finds GPUs #0 and #1 on the first node and tries to fit
Re: [gmx-users] Running gmx-4.6.x over multiple homogeneous nodes with GPU acceleration
On Wed, Jun 5, 2013 at 2:53 PM, João Henriques joao.henriques.32...@gmail.com wrote: Sorry to keep bugging you guys, but even after considering all you suggested and reading the bugzilla thread Mark pointed out, I'm still unable to make the simulation run over multiple nodes. *Here is a template of a simple submission over 2 nodes:* --- START --- #!/bin/sh # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # # Job name #SBATCH -J md # # No. of nodes and no. of processors per node #SBATCH -N 2 #SBATCH --exclusive # # Time needed to complete the job #SBATCH -t 48:00:00 # # Add modules module load gcc/4.6.3 module load openmpi/1.6.3/gcc/4.6.3 module load cuda/5.0 module load gromacs/4.6 # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # grompp -f md.mdp -c npt.gro -t npt.cpt -p topol -o md.tpr mpirun -np 4 mdrun_mpi -gpu_id 01 -deffnm md -v # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --- END --- *Here is an extract of the md.log:* --- START --- Using 4 MPI processes Using 4 OpenMP threads per MPI process Detecting CPU-specific acceleration. Present hardware specification: Vendor: GenuineIntel Brand: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz Family: 6 Model: 45 Stepping: 7 Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic Acceleration most likely to fit this hardware: AVX_256 Acceleration selected at GROMACS compile time: AVX_256 2 GPUs detected on host en001: #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible #1: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible --- Program mdrun_mpi, VERSION 4.6 Source code file: /lunarc/sw/erik/src/gromacs/gromacs-4.6/src/gmxlib/gmx_detect_hardware.c, line: 322 Fatal error: Incorrect launch configuration: mismatching number of PP MPI processes and GPUs per node. per node is critical here. mdrun_mpi was started with 4 PP MPI processes per node, but you provided 2 GPUs. ...and here. As far as mdrun_mpi knows from the MPI system there's only MPI ranks on this one node. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- --- END --- As you can see, gmx is having trouble understanding that there's a second node available. Note that since I did not specify -ntomp, it assigned 4 threads to each of the 4 mpi processes (filling the entire avail. 16 CPUs *on one node*). For the same exact submission, if I do set -ntomp 8 (since I have 4 MPI procs * 8 OpenMP threads = 32 CPUs total on the 2 nodes) I get a warning telling me that I'm hyperthreading, which can only mean that *gmx is assigning all processes to the first node once again.* Am I doing something wrong or is there some problem with gmx-4.6? I guess it can only be my fault, since I've never seen anyone else complaining about the same issue here. Assigning MPI processes to nodes is a matter configuring your MPI. GROMACS just follows the MPI system information it gets from MPI - hence the oversubscription. If you assign two MPI processes to each node, then things should work. Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Running gmx-4.6.x over multiple homogeneous nodes with GPU acceleration
Ok, thanks once again. I will do my best to overcome this issue. Best regards, João Henriques On Wed, Jun 5, 2013 at 3:33 PM, Mark Abraham mark.j.abra...@gmail.comwrote: On Wed, Jun 5, 2013 at 2:53 PM, João Henriques joao.henriques.32...@gmail.com wrote: Sorry to keep bugging you guys, but even after considering all you suggested and reading the bugzilla thread Mark pointed out, I'm still unable to make the simulation run over multiple nodes. *Here is a template of a simple submission over 2 nodes:* --- START --- #!/bin/sh # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # # Job name #SBATCH -J md # # No. of nodes and no. of processors per node #SBATCH -N 2 #SBATCH --exclusive # # Time needed to complete the job #SBATCH -t 48:00:00 # # Add modules module load gcc/4.6.3 module load openmpi/1.6.3/gcc/4.6.3 module load cuda/5.0 module load gromacs/4.6 # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # grompp -f md.mdp -c npt.gro -t npt.cpt -p topol -o md.tpr mpirun -np 4 mdrun_mpi -gpu_id 01 -deffnm md -v # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --- END --- *Here is an extract of the md.log:* --- START --- Using 4 MPI processes Using 4 OpenMP threads per MPI process Detecting CPU-specific acceleration. Present hardware specification: Vendor: GenuineIntel Brand: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz Family: 6 Model: 45 Stepping: 7 Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic Acceleration most likely to fit this hardware: AVX_256 Acceleration selected at GROMACS compile time: AVX_256 2 GPUs detected on host en001: #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible #1: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible --- Program mdrun_mpi, VERSION 4.6 Source code file: /lunarc/sw/erik/src/gromacs/gromacs-4.6/src/gmxlib/gmx_detect_hardware.c, line: 322 Fatal error: Incorrect launch configuration: mismatching number of PP MPI processes and GPUs per node. per node is critical here. mdrun_mpi was started with 4 PP MPI processes per node, but you provided 2 GPUs. ...and here. As far as mdrun_mpi knows from the MPI system there's only MPI ranks on this one node. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- --- END --- As you can see, gmx is having trouble understanding that there's a second node available. Note that since I did not specify -ntomp, it assigned 4 threads to each of the 4 mpi processes (filling the entire avail. 16 CPUs *on one node*). For the same exact submission, if I do set -ntomp 8 (since I have 4 MPI procs * 8 OpenMP threads = 32 CPUs total on the 2 nodes) I get a warning telling me that I'm hyperthreading, which can only mean that *gmx is assigning all processes to the first node once again.* Am I doing something wrong or is there some problem with gmx-4.6? I guess it can only be my fault, since I've never seen anyone else complaining about the same issue here. Assigning MPI processes to nodes is a matter configuring your MPI. GROMACS just follows the MPI system information it gets from MPI - hence the oversubscription. If you assign two MPI processes to each node, then things should work. Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- João Henriques -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Running gmx-4.6.x over multiple homogeneous nodes with GPU acceleration
Just to wrap up this thread, it does work when the mpirun is properly configured. I knew it had to be my fault :) Something like this works like a charm: mpirun -npernode 2 mdrun_mpi -ntomp 8 -gpu_id 01 -deffnm md -v Thank you Mark and Szilárd for your invaluable expertise. Best regards, João Henriques On Wed, Jun 5, 2013 at 4:21 PM, João Henriques joao.henriques.32...@gmail.com wrote: Ok, thanks once again. I will do my best to overcome this issue. Best regards, João Henriques On Wed, Jun 5, 2013 at 3:33 PM, Mark Abraham mark.j.abra...@gmail.comwrote: On Wed, Jun 5, 2013 at 2:53 PM, João Henriques joao.henriques.32...@gmail.com wrote: Sorry to keep bugging you guys, but even after considering all you suggested and reading the bugzilla thread Mark pointed out, I'm still unable to make the simulation run over multiple nodes. *Here is a template of a simple submission over 2 nodes:* --- START --- #!/bin/sh # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # # Job name #SBATCH -J md # # No. of nodes and no. of processors per node #SBATCH -N 2 #SBATCH --exclusive # # Time needed to complete the job #SBATCH -t 48:00:00 # # Add modules module load gcc/4.6.3 module load openmpi/1.6.3/gcc/4.6.3 module load cuda/5.0 module load gromacs/4.6 # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # grompp -f md.mdp -c npt.gro -t npt.cpt -p topol -o md.tpr mpirun -np 4 mdrun_mpi -gpu_id 01 -deffnm md -v # # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --- END --- *Here is an extract of the md.log:* --- START --- Using 4 MPI processes Using 4 OpenMP threads per MPI process Detecting CPU-specific acceleration. Present hardware specification: Vendor: GenuineIntel Brand: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz Family: 6 Model: 45 Stepping: 7 Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic Acceleration most likely to fit this hardware: AVX_256 Acceleration selected at GROMACS compile time: AVX_256 2 GPUs detected on host en001: #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible #1: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible --- Program mdrun_mpi, VERSION 4.6 Source code file: /lunarc/sw/erik/src/gromacs/gromacs-4.6/src/gmxlib/gmx_detect_hardware.c, line: 322 Fatal error: Incorrect launch configuration: mismatching number of PP MPI processes and GPUs per node. per node is critical here. mdrun_mpi was started with 4 PP MPI processes per node, but you provided 2 GPUs. ...and here. As far as mdrun_mpi knows from the MPI system there's only MPI ranks on this one node. For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors --- --- END --- As you can see, gmx is having trouble understanding that there's a second node available. Note that since I did not specify -ntomp, it assigned 4 threads to each of the 4 mpi processes (filling the entire avail. 16 CPUs *on one node*). For the same exact submission, if I do set -ntomp 8 (since I have 4 MPI procs * 8 OpenMP threads = 32 CPUs total on the 2 nodes) I get a warning telling me that I'm hyperthreading, which can only mean that *gmx is assigning all processes to the first node once again.* Am I doing something wrong or is there some problem with gmx-4.6? I guess it can only be my fault, since I've never seen anyone else complaining about the same issue here. Assigning MPI processes to nodes is a matter configuring your MPI. GROMACS just follows the MPI system information it gets from MPI - hence the oversubscription. If you assign two MPI processes to each node, then things should work. Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- João Henriques -- João Henriques -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Running gmx-4.6.x over multiple homogeneous nodes with GPU acceleration
Yes, documentation and output is not optimal. Resources are limited, sorry. Some of these issues are discussed in http://bugzilla.gromacs.org/issues/1135. The good news is that it sounds like you are having a non-problem. The output tacitly assumes heterogeneity. If your performance results are linear over small numbers of nodes, then you're doing fine. Mark On Tue, Jun 4, 2013 at 3:31 PM, João Henriques joao.henriques.32...@gmail.com wrote: Dear all, Since gmx-4.6 came out, I've been particularly interested in taking advantage of the native GPU acceleration for my simulations. Luckily, I have access to a cluster with the following specs PER NODE: CPU 2 E5-2650 (2.0 Ghz, 8-core) GPU 2 Nvidia K20 I've become quite familiar with the heterogenous parallelization and multiple MPI ranks per GPU schemes on a SINGLE NODE. Everything works fine, no problems at all. Currently, I'm working with a nasty system comprising 608159 tip3p water molecules and it would really help to accelerate things up a bit. Therefore, I would really like to try to parallelize my system over multiple nodes and keep the GPU acceleration. I've tried many different command combinations, but mdrun seems to be blind towards the GPUs existing on other nodes. It always finds GPUs #0 and #1 on the first node and tries to fit everything into these, completely disregarding the existence of the other GPUs on the remaining requested nodes. Once again, note that all nodes have exactly the same specs. Literature on the official gmx website is not, well... you know... in-depth and I would really appreciate if someone could shed some light into this subject. Thank you, Best regards, -- João Henriques -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Running gmx-4.6.x over multiple homogeneous nodes with GPU acceleration
mdrun is not blind, just the current design does report the hardware of all compute nodes used. Whatever CPU/GPU hardware mdrun reports in the log/std output is *only* what rank 0, i.e. the first MPI process, detects. If you have a heterogeneous hardware configuration, in most cases you should be able to run just fine, but you'll still get only the hardware the first rank sits on reported. Hence, if you want to run on 5 of the nodes you mention, you just do: mpirun -np 10 mdrun_mpi [-gpu_id 01] You may want to try both -ntomp 8 and -ntomp 16 (using HyperThreading does not always help). Also note that if you use GPU sharing among ranks (in order to use 8 threads/rank), (for some technical reasons) disabling dynamic load balancing may help - especially if you have a homogenous simulation system (and hardware setup). Cheers, -- Szilárd On Tue, Jun 4, 2013 at 3:31 PM, João Henriques joao.henriques.32...@gmail.com wrote: Dear all, Since gmx-4.6 came out, I've been particularly interested in taking advantage of the native GPU acceleration for my simulations. Luckily, I have access to a cluster with the following specs PER NODE: CPU 2 E5-2650 (2.0 Ghz, 8-core) GPU 2 Nvidia K20 I've become quite familiar with the heterogenous parallelization and multiple MPI ranks per GPU schemes on a SINGLE NODE. Everything works fine, no problems at all. Currently, I'm working with a nasty system comprising 608159 tip3p water molecules and it would really help to accelerate things up a bit. Therefore, I would really like to try to parallelize my system over multiple nodes and keep the GPU acceleration. I've tried many different command combinations, but mdrun seems to be blind towards the GPUs existing on other nodes. It always finds GPUs #0 and #1 on the first node and tries to fit everything into these, completely disregarding the existence of the other GPUs on the remaining requested nodes. Once again, note that all nodes have exactly the same specs. Literature on the official gmx website is not, well... you know... in-depth and I would really appreciate if someone could shed some light into this subject. Thank you, Best regards, -- João Henriques -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists