Let's not hijack James' thread as your hardware is different from his. On Tue, Nov 5, 2013 at 11:00 PM, Dwey Kauffman <mpi...@gmail.com> wrote: > Hi Szilard, > > Thanks for your suggestions. I am indeed aware of this page. In a 8-core > AMD with 1GPU, I am very happy about its performance. See below. My
Actually, I was jumping to conclusions too early, as you mentioned AMD "cluster", I assumed you must have 12-16-core Opteron CPUs. If you have an 8-core (desktop?) AMD CPU, than you may not need to run more than one rank per GPU. > intention is to obtain a even better one because we have multiple nodes. Btw, I'm not sure it's an economically viable solution to install Infiniband network - especially if you have desktop-class machines. Such a network will end up costing >$500 per machine just for a single network card, let alone cabling and switches. > > ### 8 core AMD with 1 GPU, > Force evaluation time GPU/CPU: 4.006 ms/2.578 ms = 1.554 > For optimal performance this ratio should be close to 1! > > > NOTE: The GPU has >20% more load than the CPU. This imbalance causes > performance loss, consider using a shorter cut-off and a finer PME > grid. > > Core t (s) Wall t (s) (%) > Time: 216205.510 27036.812 799.7 > 7h30:36 > (ns/day) (hour/ns) > Performance: 31.956 0.751 > > ### 8 core AMD with 2 GPUs > > Core t (s) Wall t (s) (%) > Time: 178961.450 22398.880 799.0 > 6h13:18 > (ns/day) (hour/ns) > Performance: 38.573 0.622 > Finished mdrun on node 0 Sat Jul 13 09:24:39 2013 > Indeed, as Richard pointed out, I was asking for *full* logs, these summaries can't tell much, the table above the summary entitled "R E A L C Y C L E A N D T I M E A C C O U N T I N G" as well as other reported information across the log file is what I need to make an assessment of your simulations' performance. >>However, in your case I suspect that the >>bottleneck is multi-threaded scaling on the AMD CPUs and you should >>probably decrease the number of threads per MPI rank and share GPUs >>between 2-4 ranks. > > > OK but can you give a example of mdrun command ? given a 8 core AMD with 2 > GPUs. > I will try to run it again. You could try running mpirun -np 4 mdrun -ntomp 2 -gpu_id 0011 but I suspect this won't help because your scaling issue > > >>Regarding scaling across nodes, you can't expect much from gigabit >>ethernet - especially not from the cheaper cards/switches, in my >>experience even reaction field runs don't scale across nodes with 10G >>ethernet if you have more than 4-6 ranks per node trying to >>communicate (let alone with PME). However, on infiniband clusters we >>have seen scaling to 100 atoms/core (at peak). > > >From your comments, it sounds like a cluster of AMD cpus is difficult to > scale across nodes in our current setup. > > Let's assume we install Infiniband (20 or 40GB/s) in the same system of 16 > nodes of 8 core AMD with 1 GPU only. Considering the same AMD system, what > is a good way to obtain better performance when we run a task across nodes > ? in other words, what dose mudrun_mpi look like ? > > Thanks, > Dwey > > > > > -- > View this message in context: > http://gromacs.5086.x6.nabble.com/Gromacs-4-6-on-two-Titans-GPUs-tp5012186p5012279.html > Sent from the GROMACS Users Forum mailing list archive at Nabble.com. > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists