On Thu, Oct 21, 2010 at 3:18 PM, Renato Freitas <renato...@gmail.com> wrote:
> Hi gromacs users, > > I have installed the lastest version of gromacs (4.5.1) in an i7 980X > (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its > mpi version. Also I compiled the GPU-accelerated > version of gromacs. Then I did a 2 ns simulation using a small system > (11042 atoms) to compare the performance of mdrun-gpu vs mdrun_mpi. > The results that I got are bellow: > > ############################################ > My *.mdp is: > > constraints = all-bonds > integrator = md > dt = 0.002 ; ps ! > nsteps = 1000000 ; total 2000 ps. > nstlist = 10 > ns_type = grid > coulombtype = PME > rvdw = 0.9 > rlist = 0.9 > rcoulomb = 0.9 > fourierspacing = 0.10 > pme_order = 4 > ewald_rtol = 1e-5 > vdwtype = cut-off > pbc = xyz > epsilon_rf = 0 > comm_mode = linear > nstxout = 1000 > nstvout = 0 > nstfout = 0 > nstxtcout = 1000 > nstlog = 1000 > nstenergy = 1000 > ; Berendsen temperature coupling is on in four groups > tcoupl = berendsen > tc-grps = system > tau-t = 0.1 > ref-t = 298 > ; Pressure coupling is on > Pcoupl = berendsen > pcoupltype = isotropic > tau_p = 0.5 > compressibility = 4.5e-5 > ref_p = 1.0 > ; Generate velocites is on at 298 K. > gen_vel = no > > ######################## > RUNNING GROMACS ON GPU > > mdrun-gpu -s topol.tpr -v > & out & > > Here is a part of the md.log: > > Started mdrun on node 0 Wed Oct 20 09:52:09 2010 > . > . > . > R E A L C Y C L E A N D T I M E A C C O U N T I N G > > Computing: Nodes Number G-Cycles Seconds % > > ------------------------------------------------------------------------------------------------------ > Write traj. 1 1021 106.075 31.7 > 0.2 > Rest 1 64125.577 19178.6 > 99.8 > > ------------------------------------------------------------------------------------------------------ > Total 1 64231.652 19210.3 100.0 > > ------------------------------------------------------------------------------------------------------ > > NODE (s) Real (s) (%) > Time: 6381.840 19210.349 33.2 > 1h46:21 > (Mnbf/s) (MFlops) (ns/day) (hour/ns) > Performance: 0.000 0.001 27.077 0.886 > > Finished mdrun on node 0 Wed Oct 20 15:12:19 2010 > > ######################## > RUNNING GROMACS ON MPI > > mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v > & out & > > Here is a part of the md.log: > > Started mdrun on node 0 Wed Oct 20 18:30:52 2010 > > R E A L C Y C L E A N D T I M E A C C O U N T I N G > > Computing: Nodes Number G-Cycles Seconds % > > -------------------------------------------------------------------------------------------------------------- > Domain decomp. 3 100001 1452.166 434.7 > 0.6 > DD comm. load 3 10001 0.745 0.2 > 0.0 > Send X to PME 3 1000001 249.003 74.5 > 0.1 > Comm. coord. 3 1000001 637.329 190.8 > 0.3 > Neighbor search 3 100001 8738.669 2616.0 > 3.5 > Force 3 1000001 99210.202 > 29699.2 39.2 > Wait + Comm. F 3 1000001 3361.591 1006.3 > 1.3 > PME mesh 3 1000001 66189.554 19814.2 > 26.2 > Wait + Comm. X/F 3 60294.513 8049.5 23.8 > Wait + Recv. PME F 3 1000001 801.897 240.1 > 0.3 > Write traj. 3 1015 33.464 > 10.0 0.0 > Update 3 1000001 3295.820 > 986.6 1.3 > Constraints 3 1000001 6317.568 > 1891.2 2.5 > Comm. energies 3 100002 70.784 21.2 > 0.0 > Rest 3 2314.844 > 693.0 0.9 > > -------------------------------------------------------------------------------------------------------------- > Total 6 252968.148 75727.5 > 100.0 > > -------------------------------------------------------------------------------------------------------------- > > -------------------------------------------------------------------------------------------------------------- > PME redist. X/F 3 2000002 1945.551 582.4 > 0.8 > PME spread/gather 3 2000002 37219.607 11141.9 > 14.7 > PME 3D-FFT 3 2000002 21453.362 6422.2 > 8.5 > PME solve 3 1000001 5551.056 > 1661.7 2.2 > > -------------------------------------------------------------------------------------------------------------- > > Parallel run - timing based on wallclock. > > NODE (s) Real (s) (%) > Time: 12621.257 12621.257 100.0 > 3h30:21 > (Mnbf/s) (GFlops) (ns/day) > (hour/ns) > Performance: 388.633 28.773 13.691 1.753 > Finished mdrun on node 0 Wed Oct 20 22:01:14 2010 > > ###################################### > Comparing the performance values for the two simulations I saw that in > "numeric terms" the simulation using the GPU gave (for example) ~27 > ns/day, while when I used mpi this value is aproximatelly half (13.7 > ns/day). > However, when I compared the time that each simulation > started/finished, the simulation using mpi tooks 211 minutes while the > gpu simulation tooked 320 minutes to finish. > > My questions are: > > 1. Why in the performace values I got better results with the GPU? > Your CPU version probably can be optimized a bit. You should use HT and run on 12. Make sure PME/PP is balanced and use the best rlist/fourier_spacing ratio. Also your PME accuracy is rather high. Make sure you need that (.11 fourier spacing should be accurate enough for rlist of .9). Your PME node spent 23% waiting on the PP nodes. > > 2. Why the simulation running on GPU was 109 min. slower than on 6 > cores, since my video card is a GTX 480 with 480 gpu cores? I was > expecting that the GPU would accelerate greatly the simulations. > The output you posted says the GPU version was faster (running only for 106min) The CPU cores are much more powerful. I would expect them to be at about as fast as the GPU. Roland
-- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists