Re: [gmx-users] Performance of beowulf cluster
On Tue, Aug 5, 2014 at 5:21 PM, Abhi Acharya abhi117acha...@gmail.com wrote: Thank you Mirco and Szilard, With regards to the GPU system, I have decided on a Xeon E5-1650 v2 system with GEForce GTX -780 Ti GPU for equilibration and production runs with small systems. But for large systems or REMD simulations, I am a bit skeptical on banking on GPU systems. How would you define large? A 100k protein system (PME, rc=0.9, vsites 5 fs) will run 50 ns/day on a box like the above, but ~5x (!) slower on an FX 8350 without a GPU! Some numbers I had around plus the CPU ones I got from some quick-and-dirty benchmark runs: i7 3930K +/- K20: 52/17.5 ns/day FX 8350 +/- GTX 580 : 31.5/10.1 ns/day I think the above Xeon may not be the best deal, it is based on the now outdated Sandy Bridge architecture, an i7 4930K will be around 10% faster; depending on your timeline the Haswell 5930K (released this fall) will be *far* better than either. Additionally, unless the AMD CPUs are very cheap, my guess is that you'll better performance per buck (and per W too) with mid-range Haswells like i5 4670/4690. Any pointers as to what would be the minimum configuration required for REMD simulations on say a 50 K atom protein sampled for 100 different temperatures? I am open to all possible options in this regard (obviously a little cost effectiveness does not harm ). For a 100-way multi-run you'll need at least 100 cores and even with fast ones you won't get too good performance - especially without GPUs. In fact, if you are planning to do REMD runs, you can make great use of GPUs! The aggregate performance of independent runs sharing a GPU (but not CPU cores) can be much greater than what you can achieve with a single run on the same GPU-CPU pair; for an example, see the second plot on this poster: http://goo.gl/2xH52y Hardware-wise, with mid-range desktop Haswell CPUs, I guess you can get about 25 ns/day and ~75 ns/day if you add a (fast enough) GPU; you can bump this by another ~20% (aggregate) if you run 2-4 independent runs per node. NOTE: I can't vouch for any of these numbers, they're guesstimates. Also, would investing in a *good* 40 Gigabit ethernet network ensure good performance if we later plan to more nodes to the cluster. As I wrote before, I personally don't have experience with MD over Ethernet. Traditionally Ethernet has been always considered borderline useless, but with the RDMA protocol iWARP over 10 and 40 GB Ethernet, I've seen people report decent results. Regards, Abhishek On Tue, Aug 5, 2014 at 5:46 PM, Szilárd Páll pall.szil...@gmail.com wrote: Hi, You need fast network to parallelize across multiple nodes. 1 Gb ethernet won't work well and even even 10/40 Gb ethernet needs to be of good quality; you'd likely need to buy separate adapters, the on-board ones won't perform well. I posted some links to the list related to this a fed days ago. The AMD FX dekstop hardware you mention is OK, but I'm not sure that it's gives the best performance/price. If you find (very) discounted Sandy Bridge-E (i7 3930K) or the cheaper Haswells like i5 4670 may actually provide better prerformance for the money. Ivy Bridge-E or Haswell-E as Mirco suggests are the best single-socket workstation options, but those are/will be pretty expensive. Finally, unless you have a good reason not to, you should not just consider GPUs, but consider what CPU/platform works best with GPUs. Cheers, -- Szilárd On Tue, Aug 5, 2014 at 7:01 AM, Abhishek Acharya abhi117acha...@gmail.com wrote: Hello gromacs users, I am planning on investing in a beowulf cluster with 6 nodes (48 cores) each with AMD Fx 8350 processor, 8 GB memory connected by 1 Gigabit Ethernet switch. Although I plan to add more cores to this cluster later on, what is the max performance expected from the current specs for a 100,000 atom simulation box ? Also, is it better to invest in a single 48 core server ? The cluster system can be set up at almost half the price of a 48 core server, but do we lose out on performance in the process?? Regards, Abhishek Acharya -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Abhishek Acharya Senior Research Fellow Gene Regulation Laboratory National Institute of Immunology -- Gromacs
Re: [gmx-users] Performance of beowulf cluster
Thank you Dr. Szilard. This was really helpful. Incidentally, we eventually decided on a i7-4930K, so we got that right ;). As advised, we have now junked the idea of an ethernet cluster. We will be testing the GPU systems first, then we will decide on furthur course of action. Thanks again. Regards, Abhishek On Tue, Aug 12, 2014 at 6:00 AM, Szilárd Páll pall.szil...@gmail.com wrote: On Tue, Aug 5, 2014 at 5:21 PM, Abhi Acharya abhi117acha...@gmail.com wrote: Thank you Mirco and Szilard, With regards to the GPU system, I have decided on a Xeon E5-1650 v2 system with GEForce GTX -780 Ti GPU for equilibration and production runs with small systems. But for large systems or REMD simulations, I am a bit skeptical on banking on GPU systems. How would you define large? A 100k protein system (PME, rc=0.9, vsites 5 fs) will run 50 ns/day on a box like the above, but ~5x (!) slower on an FX 8350 without a GPU! Some numbers I had around plus the CPU ones I got from some quick-and-dirty benchmark runs: i7 3930K +/- K20: 52/17.5 ns/day FX 8350 +/- GTX 580 : 31.5/10.1 ns/day I think the above Xeon may not be the best deal, it is based on the now outdated Sandy Bridge architecture, an i7 4930K will be around 10% faster; depending on your timeline the Haswell 5930K (released this fall) will be *far* better than either. Additionally, unless the AMD CPUs are very cheap, my guess is that you'll better performance per buck (and per W too) with mid-range Haswells like i5 4670/4690. Any pointers as to what would be the minimum configuration required for REMD simulations on say a 50 K atom protein sampled for 100 different temperatures? I am open to all possible options in this regard (obviously a little cost effectiveness does not harm ). For a 100-way multi-run you'll need at least 100 cores and even with fast ones you won't get too good performance - especially without GPUs. In fact, if you are planning to do REMD runs, you can make great use of GPUs! The aggregate performance of independent runs sharing a GPU (but not CPU cores) can be much greater than what you can achieve with a single run on the same GPU-CPU pair; for an example, see the second plot on this poster: http://goo.gl/2xH52y Hardware-wise, with mid-range desktop Haswell CPUs, I guess you can get about 25 ns/day and ~75 ns/day if you add a (fast enough) GPU; you can bump this by another ~20% (aggregate) if you run 2-4 independent runs per node. NOTE: I can't vouch for any of these numbers, they're guesstimates. Also, would investing in a *good* 40 Gigabit ethernet network ensure good performance if we later plan to more nodes to the cluster. As I wrote before, I personally don't have experience with MD over Ethernet. Traditionally Ethernet has been always considered borderline useless, but with the RDMA protocol iWARP over 10 and 40 GB Ethernet, I've seen people report decent results. Regards, Abhishek On Tue, Aug 5, 2014 at 5:46 PM, Szilárd Páll pall.szil...@gmail.com wrote: Hi, You need fast network to parallelize across multiple nodes. 1 Gb ethernet won't work well and even even 10/40 Gb ethernet needs to be of good quality; you'd likely need to buy separate adapters, the on-board ones won't perform well. I posted some links to the list related to this a fed days ago. The AMD FX dekstop hardware you mention is OK, but I'm not sure that it's gives the best performance/price. If you find (very) discounted Sandy Bridge-E (i7 3930K) or the cheaper Haswells like i5 4670 may actually provide better prerformance for the money. Ivy Bridge-E or Haswell-E as Mirco suggests are the best single-socket workstation options, but those are/will be pretty expensive. Finally, unless you have a good reason not to, you should not just consider GPUs, but consider what CPU/platform works best with GPUs. Cheers, -- Szilárd On Tue, Aug 5, 2014 at 7:01 AM, Abhishek Acharya abhi117acha...@gmail.com wrote: Hello gromacs users, I am planning on investing in a beowulf cluster with 6 nodes (48 cores) each with AMD Fx 8350 processor, 8 GB memory connected by 1 Gigabit Ethernet switch. Although I plan to add more cores to this cluster later on, what is the max performance expected from the current specs for a 100,000 atom simulation box ? Also, is it better to invest in a single 48 core server ? The cluster system can be set up at almost half the price of a 48 core server, but do we lose out on performance in the process?? Regards, Abhishek Acharya -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to
Re: [gmx-users] Performance of beowulf cluster
On 05.08.2014 07:01, Abhishek Acharya wrote: I am planning on investing in a beowulf cluster with 6 nodes (48 cores) each with AMD Fx 8350 processor, 8 GB memory connected by 1 Gigabit Ethernet switch. Although I plan to add more cores to this cluster later on, what is the max performance expected from the current specs for a 100,000 atom simulation box ? Also, is it better to invest in a single 48 core server ? The cluster system can be set up at almost half the price of a 48 core server, but do we lose out on performance in the process? 6 AMD-8350 boxes, connected to *one* 1GB switch? This system could be put to very good use if you are able to perform 6 *independent simulations* on your molecular system. 100,000 Atoms is a rather small system for large scale parallelization. A 100K SPC box would have an edge length of about 10nm? If it's important for you to have parallel runs on single molecular systems, you could consider a dual-socket-2011 system running 6-core i7 processors (i7/4930K or upcoming Haswell-E 5930K) combined with quad-channel DDR3/4. This would give you a 24x parallelization on a single workstation. What about modern (Nvidia) consumer graphics cards? These are supported very well by Gromacs. Regards M. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Performance of beowulf cluster
Hi, You need fast network to parallelize across multiple nodes. 1 Gb ethernet won't work well and even even 10/40 Gb ethernet needs to be of good quality; you'd likely need to buy separate adapters, the on-board ones won't perform well. I posted some links to the list related to this a fed days ago. The AMD FX dekstop hardware you mention is OK, but I'm not sure that it's gives the best performance/price. If you find (very) discounted Sandy Bridge-E (i7 3930K) or the cheaper Haswells like i5 4670 may actually provide better prerformance for the money. Ivy Bridge-E or Haswell-E as Mirco suggests are the best single-socket workstation options, but those are/will be pretty expensive. Finally, unless you have a good reason not to, you should not just consider GPUs, but consider what CPU/platform works best with GPUs. Cheers, -- Szilárd On Tue, Aug 5, 2014 at 7:01 AM, Abhishek Acharya abhi117acha...@gmail.com wrote: Hello gromacs users, I am planning on investing in a beowulf cluster with 6 nodes (48 cores) each with AMD Fx 8350 processor, 8 GB memory connected by 1 Gigabit Ethernet switch. Although I plan to add more cores to this cluster later on, what is the max performance expected from the current specs for a 100,000 atom simulation box ? Also, is it better to invest in a single 48 core server ? The cluster system can be set up at almost half the price of a 48 core server, but do we lose out on performance in the process?? Regards, Abhishek Acharya -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Performance of beowulf cluster
Thank you Mirco and Szilard, With regards to the GPU system, I have decided on a Xeon E5-1650 v2 system with GEForce GTX -780 Ti GPU for equilibration and production runs with small systems. But for large systems or REMD simulations, I am a bit skeptical on banking on GPU systems. Any pointers as to what would be the minimum configuration required for REMD simulations on say a 50 K atom protein sampled for 100 different temperatures? I am open to all possible options in this regard (obviously a little cost effectiveness does not harm ). Also, would investing in a *good* 40 Gigabit ethernet network ensure good performance if we later plan to more nodes to the cluster. Regards, Abhishek On Tue, Aug 5, 2014 at 5:46 PM, Szilárd Páll pall.szil...@gmail.com wrote: Hi, You need fast network to parallelize across multiple nodes. 1 Gb ethernet won't work well and even even 10/40 Gb ethernet needs to be of good quality; you'd likely need to buy separate adapters, the on-board ones won't perform well. I posted some links to the list related to this a fed days ago. The AMD FX dekstop hardware you mention is OK, but I'm not sure that it's gives the best performance/price. If you find (very) discounted Sandy Bridge-E (i7 3930K) or the cheaper Haswells like i5 4670 may actually provide better prerformance for the money. Ivy Bridge-E or Haswell-E as Mirco suggests are the best single-socket workstation options, but those are/will be pretty expensive. Finally, unless you have a good reason not to, you should not just consider GPUs, but consider what CPU/platform works best with GPUs. Cheers, -- Szilárd On Tue, Aug 5, 2014 at 7:01 AM, Abhishek Acharya abhi117acha...@gmail.com wrote: Hello gromacs users, I am planning on investing in a beowulf cluster with 6 nodes (48 cores) each with AMD Fx 8350 processor, 8 GB memory connected by 1 Gigabit Ethernet switch. Although I plan to add more cores to this cluster later on, what is the max performance expected from the current specs for a 100,000 atom simulation box ? Also, is it better to invest in a single 48 core server ? The cluster system can be set up at almost half the price of a 48 core server, but do we lose out on performance in the process?? Regards, Abhishek Acharya -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Abhishek Acharya Senior Research Fellow Gene Regulation Laboratory National Institute of Immunology -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
[gmx-users] Performance of beowulf cluster
Hello gromacs users, I am planning on investing in a beowulf cluster with 6 nodes (48 cores) each with AMD Fx 8350 processor, 8 GB memory connected by 1 Gigabit Ethernet switch. Although I plan to add more cores to this cluster later on, what is the max performance expected from the current specs for a 100,000 atom simulation box ? Also, is it better to invest in a single 48 core server ? The cluster system can be set up at almost half the price of a 48 core server, but do we lose out on performance in the process?? Regards, Abhishek Acharya -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.