[gmx-users] Performance of NVIDIA GTX980 in PCI-e 3.0 x8 or x16 slots ?
Dear Gromacs Users, We're finally buying some Intel E52650 servers + NVIDIA GTX980 cards. However, there's some servers that come with only PCI-e 3.0 x8 slots and others with x16 slots. Do you think this is relevant for gromacs performance ? And if so, how much relevant ? Thanks in advance. Cheers, David. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Performance of NVIDIA GTX980 in PCI-e 3.0 x8 or x16 slots ?
Hey Mirco, Your 1-3% claim is based on the webpage you linked ? Is it reliable to compare GPU performances for gromacs with those of 3D videogames ? Thanks! 2015-06-11 13:21 GMT+02:00 Mirco Wahab mirco.wa...@chemie.tu-freiberg.de: On 11.06.2015 13:08, David McGiven wrote: We're finally buying some Intel E52650 servers + NVIDIA GTX980 cards. However, there's some servers that come with only PCI-e 3.0 x8 slots and others with x16 slots. Do you think this is relevant for gromacs performance ? And if so, how much relevant ? It's more important to select PCIe 3.0 mode. Then, the difference between 16x and 8x is imho very low (1-3%). M. P.S.: http://www.techpowerup.com/reviews/NVIDIA/GTX_980_PCI-Express_Scaling/ -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
[gmx-users] GMX 5.0 compilation across different platforms
Dear All, I would like to statically compile GROMACS 5 in an Intel Xeon X3430 machine with gcc4.7 (cluster front node) BUT run it on an Intel Xeon E5-2650V2 machine (cluster compute node). Would that be possible ? And if so, how should I do it ? Haven't found it on the Installation_Instructions webpage. Thanks in advance. BR, D. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] NVIDIA GTX cards in Rackable servers, how do you do it ?
Hey Harry, Thanks for the caveat. Carsten Kutzner posted these results a few days ago. This is what he said : I never benchmarked 64-core AMD nodes with GPUs. With a 80 k atoms test system using a 2 fs time step I get 24 ns/d on 64 AMD cores 6272 16 ns/d on 32 AMD cores 6380 36 ns/d on 32 AMD cores 6380 with 1x GTX 980 40 ns/d on 32 AMD cores 6380 with 2x GTX 980 27 ns/d on 20 Intel cores 2680v2 52 ns/d on 20 Intel cores 2680v2 with 1x GTX 980 62 ns/d on 20 Intel cores 2680v2 with 2x GTX 980 I think 20 Intel cores means 2 x 10 cores each. But Szilard just mentioned in this same thread : If you can afford them get the 14/16 or 18 core v3 Haswells, those are *really* fast, but a pair can cost as much as a decent car. I know for sure gromacs escalates VERY well on 4 x 16 cores latests AMD (Interlagos, Bulldozer, etc.) machines. But have no experience with Intel Xeon. Let's see what others can say. BR, D 2015-02-24 13:17 GMT+01:00 Harry Mark Greenblatt harry.greenbl...@weizmann.ac.il: BSD Dear David, We did some tests with Gromacs and other programs on CPU's with core counts up to 16 per socket, and found that after about 12 cores jobs/threads begin to interfere with each other. In other words there was a performace penalty when using core counts above 12. I don't have the details in front of me, but you should at the very least get a test machine and try running your simulations for short periods with 10, 12, 14, 16 and 18 cores in use to see how Gromacs behaves with these processors (unless someone has done these tests, and can confirm that Gromacs has no issues with 16 or 18 core cpu's). Harry On Feb 24, 2015, at 1:32 PM, David McGiven wrote: Hi Szilard, Thank you very much for your great advice. 2015-02-20 19:03 GMT+01:00 Szilárd Páll pall.szil...@gmail.commailto: pall.szil...@gmail.com: On Fri, Feb 20, 2015 at 2:17 PM, David McGiven davidmcgiv...@gmail.com mailto:davidmcgiv...@gmail.com wrote: Dear Gromacs users and developers, We are thinking about buying a new cluster of ten or twelve 1U/2U machines with 2 Intel Xeon CPU's 8-12 cores each. Some of the 2600v2 or v3 series. Not yet clear the details, we'll see. If you can afford them get the 14/16 or 18 core v3 Haswells, those are *really* fast, but a pair can cost as much as a decent car. Get IVB (v2) if it saves you a decent amount of money compared to v3. The AVX2 with FMA of the Haswell chips is great, but if you run GROMACS with GPUs on them my guess is that a higher frequency v2 will be more advantageous than the v3's AVX2 support. Won't swear on this as I have not tested thoroughly. According to an email exchange I had with Carsten Kutzner, for the kind of simulations we would like to run (see below), lower frequency v2's give better performance-to-price ratio. For instance, we can get from a national reseller : 2U server (supermicro rebranded I guess) 2 x E5-2699V3 18c 2,3Ghz 64 GB DDR4 2 x GTX980 (certified for the server) - 13.400 EUR (sans VAT) 2U server (supermicro rebranded I guess) 2 x E5-2695V2 12c 2,4 Ghz 64 GB DDR3 2 x GTX980 (certified for the server) - 9.140 EUR (sans VAT) Does that qualify as saving a decent amount of money to go for the V2 ? I don't think so, also because we care about rack space. Less servers but potent ones. The latests haswells are way too overpriced for us. We want to run molecular dynamics simulations of transmembrane proteins inside a POPC lipid bilayer, in a system with ~10 atoms, from which almost 1/3 correspond to water molecules and employing usual conditions with PME for electorstatics and cutoffs for LJ interactions. I think we'll go for the V3 version. I've been told in this list that NVIDIA GTX offer the best performance/price ratio for gromacs 5.0. Yes, that is the case. However, I am wondering ... How do you guys use the GTX cards in rackable servers ? GTX cards are consummer grade, for personal workstations, gaming, and so on and it's nearly impossible to find any servers manufacturer like HP, Dell, SuperMicro, etc. to certify that those cards will function properly on their servers. Certification can be an issue - unless you buy many and you can cut a deal with a company. There are some companies that do certify servers, but AFAIK most/all are US-based. I won't do public a long advertisement here, but you can find many names if you browse NVIDIA's GPU computing site (and as a matter of fact the AMBER GPU site is quite helpful in this respect too). You can consider getting vanilla server nodes and plug the GTX cards in yourself. In general, I can recommend Supermicro, they have pretty good value servers from 1 to 4U. The easiest is to use the latter because GTX cards will just fit vertically, but it will be a serious waste of rack-space. With a bit of tinkering you may be able to get GTX cards into 3U, but you'll either need cards with connectors
Re: [gmx-users] NVIDIA GTX cards in Rackable servers, how do you do it ?
Hi Szilard, Thank you very much for your great advice. 2015-02-20 19:03 GMT+01:00 Szilárd Páll pall.szil...@gmail.com: On Fri, Feb 20, 2015 at 2:17 PM, David McGiven davidmcgiv...@gmail.com wrote: Dear Gromacs users and developers, We are thinking about buying a new cluster of ten or twelve 1U/2U machines with 2 Intel Xeon CPU's 8-12 cores each. Some of the 2600v2 or v3 series. Not yet clear the details, we'll see. If you can afford them get the 14/16 or 18 core v3 Haswells, those are *really* fast, but a pair can cost as much as a decent car. Get IVB (v2) if it saves you a decent amount of money compared to v3. The AVX2 with FMA of the Haswell chips is great, but if you run GROMACS with GPUs on them my guess is that a higher frequency v2 will be more advantageous than the v3's AVX2 support. Won't swear on this as I have not tested thoroughly. According to an email exchange I had with Carsten Kutzner, for the kind of simulations we would like to run (see below), lower frequency v2's give better performance-to-price ratio. For instance, we can get from a national reseller : 2U server (supermicro rebranded I guess) 2 x E5-2699V3 18c 2,3Ghz 64 GB DDR4 2 x GTX980 (certified for the server) - 13.400 EUR (sans VAT) 2U server (supermicro rebranded I guess) 2 x E5-2695V2 12c 2,4 Ghz 64 GB DDR3 2 x GTX980 (certified for the server) - 9.140 EUR (sans VAT) Does that qualify as saving a decent amount of money to go for the V2 ? I don't think so, also because we care about rack space. Less servers but potent ones. The latests haswells are way too overpriced for us. We want to run molecular dynamics simulations of transmembrane proteins inside a POPC lipid bilayer, in a system with ~10 atoms, from which almost 1/3 correspond to water molecules and employing usual conditions with PME for electorstatics and cutoffs for LJ interactions. I think we'll go for the V3 version. I've been told in this list that NVIDIA GTX offer the best performance/price ratio for gromacs 5.0. Yes, that is the case. However, I am wondering ... How do you guys use the GTX cards in rackable servers ? GTX cards are consummer grade, for personal workstations, gaming, and so on and it's nearly impossible to find any servers manufacturer like HP, Dell, SuperMicro, etc. to certify that those cards will function properly on their servers. Certification can be an issue - unless you buy many and you can cut a deal with a company. There are some companies that do certify servers, but AFAIK most/all are US-based. I won't do public a long advertisement here, but you can find many names if you browse NVIDIA's GPU computing site (and as a matter of fact the AMBER GPU site is quite helpful in this respect too). You can consider getting vanilla server nodes and plug the GTX cards in yourself. In general, I can recommend Supermicro, they have pretty good value servers from 1 to 4U. The easiest is to use the latter because GTX cards will just fit vertically, but it will be a serious waste of rack-space. With a bit of tinkering you may be able to get GTX cards into 3U, but you'll either need cards with connectors on the back or 90 deg angled 4-pin PCIE power cables. Otherwise you can only fit the cards with PCIE raisers and I have no experience with that setup, but I know some build denser machines with GTX cards. Cheer, -- Szilárd What are your views about this ? Thanks. Best Regards -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] NVIDIA GTX cards in Rackable servers, how do you do it ?
2015-02-24 15:46 GMT+01:00 Szilárd Páll pall.szil...@gmail.com: Perhaps he has seen some real results that do not show issues at 16 or 18 cores/socket, in which case they would be advantageous, if one can afford them. I am only going on what the manager of our cluster mentioned to me in his tests. But his tests were based on many different software packages, so perhaps Gromacs is less/not affected. OK, that's an entirely different claim than the one you made initially. I dare to say that it is dangerous to mix performance observations of many software packages - especially with that of GROMACS. Totally agree. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Cluster recommendations
Hi Carsten, Sorry I just saw your message today. Thank you very much for the details. Cheers 2015-02-02 14:11 GMT+01:00 Carsten Kutzner ckut...@gwdg.de: Hi David, On 22 Jan 2015, at 18:01, David McGiven davidmcgiv...@gmail.com wrote: Hey Karsten, Just another question. What do you think will be the performance difference between two gromacs runs with a ~100k atoms system like the one I mentioned on my first email : - 1 server with 4 AMD processors, 16 cores each (64 cores) with no GPU - 1 server with 4 AMD processors, 16 cores each (64 cores) with one GTX 980 GPU - 1 server with 2 Intel processors, 10 cores each (20 cores) like the ones you mentioned, with one or two GTX 980 GPU. I'm not interested in exact performance numbers, I just need to understand the logistics behind the CPU/GPU combinations in order to make an inteligent cluster purchase. I never benchmarked 64-core AMD nodes with GPUs. With a 80 k atoms test system using a 2 fs time step I get 24 ns/d on 64 AMD cores 6272 16 ns/d on 32 AMD cores 6380 36 ns/d on 32 AMD cores 6380 with 1x GTX 980 40 ns/d on 32 AMD cores 6380 with 2x GTX 980 27 ns/d on 20 Intel cores 2680v2 52 ns/d on 20 Intel cores 2680v2 with 1x GTX 980 62 ns/d on 20 Intel cores 2680v2 with 2x GTX 980 So unless you can get the AMD nodes very cheap, probably the 20-core Intel nodes with 1 or 2 GPUs will give you the best performance and the best performance/price. Best, Carsten Thanks again. Best, D 2015-01-16 14:46 GMT+01:00 Carsten Kutzner ckut...@gwdg.de: Hi David, On 16 Jan 2015, at 12:28, David McGiven davidmcgiv...@gmail.com wrote: Hi Carsten, Thanks for your answer. 2015-01-16 11:11 GMT+01:00 Carsten Kutzner ckut...@gwdg.de: Hi David, we are just finishing an evaluation to find out which is the optimal hardware for Gromacs setups. One of the input systems is an 80,000 atom membrane channel system and thus nearly exactly what you want to compute. The biggest benefit you will get by adding one or two consumer-class GPUs to your nodes (e.g. NVIDIA GTX 980). That will typically double your performace-to-price ratio. This is true for Intel as well as for AMD nodes, however the best ratio in our tests was observed with 10-core Intel CPUs (2670v2, 2680v2) in combination with a GTX 780Ti or 980, ideally two of those CPUs with two GPUs on a node. Was there a difference between 2670v2 (2.5 GHz) and 2680v2 (2.8 GHz) ? I'm wondering if those 0,3 GHz are significative. Or the 0,5 GHz compared to 2690v2 for the matter. There’s a significative difference in price indeed. Usually the percent improvement for Gromacs performance is not as much as the percent improvement in clock speed, so the cheaper ones will give you a higher performance-to-price ratio. I'm also wondering if the performance would be better with 16 core Intels instead of 10 core. I.e E5-2698 v3. Didn’t test those. I would like to know which other tests have you done. What about AMD ? We tested AMD 6380 with 1-2 GTX 980 GPUs, which gives about the same performance-to-price ratio as a 10 core Intel 2680v2 node with one GTX 980. The Intel node gives you a higher per-node performance, though. Unless you want to buy expensive FDR14 Infiniband, scaling across two or more of those nodes won’t be good (~0.65 parallel efficiency across 2, ~0.45 across 4 nodes using QDR infiniband), so I would advise against it and go for more sampling on single nodes. Well, that puzzles me. Why is it that you get poor performance ? Are you talking about pure CPU jobs over infiniband, or are you talking about CPU+GPU jobs over infiniband ? For a given network (e.g. QDR Infiniband), the scaling is better the lower the performance of the individual nodes. So for CPU-only nodes you will get a better scaling than for CPU+GPU nodes, which have a way higher per-node performance. How come you won’t get good performance if a great percentage of The performance is good, it is just that the parallel efficiency is not optimal for an MD system 100,000 atoms, meaning you do not get two times the performance on two nodes in parallel as compared to the aggregated performance of two individual runs. Bigger systems will have a better parallel efficiency. supercomputer centers in the world use InfiniBand ? And I'm sure lots of users here in the list use gromacs over Infiniband. I do, too :) But you get more trajectory for your money if you can wait and run on a single node. Carsten Thanks again. Best Regards, D Best, Carsten On 15 Jan 2015, at 17:35, David McGiven davidmcgiv...@gmail.com wrote: Dear Gromacs Users, We’ve got some funding to build a new cluster. It’s going to be used mainly for gromacs simulations (80% of the time). We run molecular dynamics simulations
[gmx-users] NVIDIA GTX cards in Rackable servers, how do you do it ?
Dear Gromacs users and developers, We are thinking about buying a new cluster of ten or twelve 1U/2U machines with 2 Intel Xeon CPU's 8-12 cores each. Some of the 2600v2 or v3 series. Not yet clear the details, we'll see. I've been told in this list that NVIDIA GTX offer the best performance/price ratio for gromacs 5.0. However, I am wondering ... How do you guys use the GTX cards in rackable servers ? GTX cards are consummer grade, for personal workstations, gaming, and so on and it's nearly impossible to find any servers manufacturer like HP, Dell, SuperMicro, etc. to certify that those cards will function properly on their servers. What are your views about this ? Thanks. Best Regards -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Cluster recommendations
Hey Karsten, Just another question. What do you think will be the performance difference between two gromacs runs with a ~100k atoms system like the one I mentioned on my first email : - 1 server with 4 AMD processors, 16 cores each (64 cores) with no GPU - 1 server with 4 AMD processors, 16 cores each (64 cores) with one GTX 980 GPU - 1 server with 2 Intel processors, 10 cores each (20 cores) like the ones you mentioned, with one or two GTX 980 GPU. I'm not interested in exact performance numbers, I just need to understand the logistics behind the CPU/GPU combinations in order to make an inteligent cluster purchase. Thanks again. Best, D 2015-01-16 14:46 GMT+01:00 Carsten Kutzner ckut...@gwdg.de: Hi David, On 16 Jan 2015, at 12:28, David McGiven davidmcgiv...@gmail.com wrote: Hi Carsten, Thanks for your answer. 2015-01-16 11:11 GMT+01:00 Carsten Kutzner ckut...@gwdg.de: Hi David, we are just finishing an evaluation to find out which is the optimal hardware for Gromacs setups. One of the input systems is an 80,000 atom membrane channel system and thus nearly exactly what you want to compute. The biggest benefit you will get by adding one or two consumer-class GPUs to your nodes (e.g. NVIDIA GTX 980). That will typically double your performace-to-price ratio. This is true for Intel as well as for AMD nodes, however the best ratio in our tests was observed with 10-core Intel CPUs (2670v2, 2680v2) in combination with a GTX 780Ti or 980, ideally two of those CPUs with two GPUs on a node. Was there a difference between 2670v2 (2.5 GHz) and 2680v2 (2.8 GHz) ? I'm wondering if those 0,3 GHz are significative. Or the 0,5 GHz compared to 2690v2 for the matter. There’s a significative difference in price indeed. Usually the percent improvement for Gromacs performance is not as much as the percent improvement in clock speed, so the cheaper ones will give you a higher performance-to-price ratio. I'm also wondering if the performance would be better with 16 core Intels instead of 10 core. I.e E5-2698 v3. Didn’t test those. I would like to know which other tests have you done. What about AMD ? We tested AMD 6380 with 1-2 GTX 980 GPUs, which gives about the same performance-to-price ratio as a 10 core Intel 2680v2 node with one GTX 980. The Intel node gives you a higher per-node performance, though. Unless you want to buy expensive FDR14 Infiniband, scaling across two or more of those nodes won’t be good (~0.65 parallel efficiency across 2, ~0.45 across 4 nodes using QDR infiniband), so I would advise against it and go for more sampling on single nodes. Well, that puzzles me. Why is it that you get poor performance ? Are you talking about pure CPU jobs over infiniband, or are you talking about CPU+GPU jobs over infiniband ? For a given network (e.g. QDR Infiniband), the scaling is better the lower the performance of the individual nodes. So for CPU-only nodes you will get a better scaling than for CPU+GPU nodes, which have a way higher per-node performance. How come you won’t get good performance if a great percentage of The performance is good, it is just that the parallel efficiency is not optimal for an MD system 100,000 atoms, meaning you do not get two times the performance on two nodes in parallel as compared to the aggregated performance of two individual runs. Bigger systems will have a better parallel efficiency. supercomputer centers in the world use InfiniBand ? And I'm sure lots of users here in the list use gromacs over Infiniband. I do, too :) But you get more trajectory for your money if you can wait and run on a single node. Carsten Thanks again. Best Regards, D Best, Carsten On 15 Jan 2015, at 17:35, David McGiven davidmcgiv...@gmail.com wrote: Dear Gromacs Users, We’ve got some funding to build a new cluster. It’s going to be used mainly for gromacs simulations (80% of the time). We run molecular dynamics simulations of transmembrane proteins inside a POPC lipid bilayer. In a typical system we have ~10 atoms, from which almost 1/3 correspond to water molecules. We employ usual conditions with PME for electorstatics and cutoffs for LJ interactions. I would like to hear your advice on which kind of machines are the best bang-for-the-buck for that kind of simulations. For instance : - Intel or AMD ? My understanding is that Intel is faster but expensive, and AMD is slower but cheaper. So at the end you almost get the same performance-per-buck. Right ? - Many CPUs/Cores x machine or less ? My understanding is that the more cores x machine the lesser the costs. One machine is always cheaper to buy and maintain than various. Plus maybe you can save the costs of Infiniband if you use large core densities ? - Should we invest in an Infiniband network to run jobs across multiple nodes
Re: [gmx-users] Cluster recommendations
Sorry where it says between two gromacs runs I must have said three gromacs runs. One for each combination of cpu/gpu. 2015-01-22 18:01 GMT+01:00 David McGiven davidmcgiv...@gmail.com: Hey Karsten, Just another question. What do you think will be the performance difference between two gromacs runs with a ~100k atoms system like the one I mentioned on my first email : - 1 server with 4 AMD processors, 16 cores each (64 cores) with no GPU - 1 server with 4 AMD processors, 16 cores each (64 cores) with one GTX 980 GPU - 1 server with 2 Intel processors, 10 cores each (20 cores) like the ones you mentioned, with one or two GTX 980 GPU. I'm not interested in exact performance numbers, I just need to understand the logistics behind the CPU/GPU combinations in order to make an inteligent cluster purchase. Thanks again. Best, D 2015-01-16 14:46 GMT+01:00 Carsten Kutzner ckut...@gwdg.de: Hi David, On 16 Jan 2015, at 12:28, David McGiven davidmcgiv...@gmail.com wrote: Hi Carsten, Thanks for your answer. 2015-01-16 11:11 GMT+01:00 Carsten Kutzner ckut...@gwdg.de: Hi David, we are just finishing an evaluation to find out which is the optimal hardware for Gromacs setups. One of the input systems is an 80,000 atom membrane channel system and thus nearly exactly what you want to compute. The biggest benefit you will get by adding one or two consumer-class GPUs to your nodes (e.g. NVIDIA GTX 980). That will typically double your performace-to-price ratio. This is true for Intel as well as for AMD nodes, however the best ratio in our tests was observed with 10-core Intel CPUs (2670v2, 2680v2) in combination with a GTX 780Ti or 980, ideally two of those CPUs with two GPUs on a node. Was there a difference between 2670v2 (2.5 GHz) and 2680v2 (2.8 GHz) ? I'm wondering if those 0,3 GHz are significative. Or the 0,5 GHz compared to 2690v2 for the matter. There’s a significative difference in price indeed. Usually the percent improvement for Gromacs performance is not as much as the percent improvement in clock speed, so the cheaper ones will give you a higher performance-to-price ratio. I'm also wondering if the performance would be better with 16 core Intels instead of 10 core. I.e E5-2698 v3. Didn’t test those. I would like to know which other tests have you done. What about AMD ? We tested AMD 6380 with 1-2 GTX 980 GPUs, which gives about the same performance-to-price ratio as a 10 core Intel 2680v2 node with one GTX 980. The Intel node gives you a higher per-node performance, though. Unless you want to buy expensive FDR14 Infiniband, scaling across two or more of those nodes won’t be good (~0.65 parallel efficiency across 2, ~0.45 across 4 nodes using QDR infiniband), so I would advise against it and go for more sampling on single nodes. Well, that puzzles me. Why is it that you get poor performance ? Are you talking about pure CPU jobs over infiniband, or are you talking about CPU+GPU jobs over infiniband ? For a given network (e.g. QDR Infiniband), the scaling is better the lower the performance of the individual nodes. So for CPU-only nodes you will get a better scaling than for CPU+GPU nodes, which have a way higher per-node performance. How come you won’t get good performance if a great percentage of The performance is good, it is just that the parallel efficiency is not optimal for an MD system 100,000 atoms, meaning you do not get two times the performance on two nodes in parallel as compared to the aggregated performance of two individual runs. Bigger systems will have a better parallel efficiency. supercomputer centers in the world use InfiniBand ? And I'm sure lots of users here in the list use gromacs over Infiniband. I do, too :) But you get more trajectory for your money if you can wait and run on a single node. Carsten Thanks again. Best Regards, D Best, Carsten On 15 Jan 2015, at 17:35, David McGiven davidmcgiv...@gmail.com wrote: Dear Gromacs Users, We’ve got some funding to build a new cluster. It’s going to be used mainly for gromacs simulations (80% of the time). We run molecular dynamics simulations of transmembrane proteins inside a POPC lipid bilayer. In a typical system we have ~10 atoms, from which almost 1/3 correspond to water molecules. We employ usual conditions with PME for electorstatics and cutoffs for LJ interactions. I would like to hear your advice on which kind of machines are the best bang-for-the-buck for that kind of simulations. For instance : - Intel or AMD ? My understanding is that Intel is faster but expensive, and AMD is slower but cheaper. So at the end you almost get the same performance-per-buck. Right ? - Many CPUs/Cores x machine or less ? My understanding is that the more cores x machine the lesser the costs. One machine is always cheaper to buy
Re: [gmx-users] Cluster recommendations
Thank you very much Karsten. 2015-01-16 14:46 GMT+01:00 Carsten Kutzner ckut...@gwdg.de: Hi David, On 16 Jan 2015, at 12:28, David McGiven davidmcgiv...@gmail.com wrote: Hi Carsten, Thanks for your answer. 2015-01-16 11:11 GMT+01:00 Carsten Kutzner ckut...@gwdg.de: Hi David, we are just finishing an evaluation to find out which is the optimal hardware for Gromacs setups. One of the input systems is an 80,000 atom membrane channel system and thus nearly exactly what you want to compute. The biggest benefit you will get by adding one or two consumer-class GPUs to your nodes (e.g. NVIDIA GTX 980). That will typically double your performace-to-price ratio. This is true for Intel as well as for AMD nodes, however the best ratio in our tests was observed with 10-core Intel CPUs (2670v2, 2680v2) in combination with a GTX 780Ti or 980, ideally two of those CPUs with two GPUs on a node. Was there a difference between 2670v2 (2.5 GHz) and 2680v2 (2.8 GHz) ? I'm wondering if those 0,3 GHz are significative. Or the 0,5 GHz compared to 2690v2 for the matter. There’s a significative difference in price indeed. Usually the percent improvement for Gromacs performance is not as much as the percent improvement in clock speed, so the cheaper ones will give you a higher performance-to-price ratio. I'm also wondering if the performance would be better with 16 core Intels instead of 10 core. I.e E5-2698 v3. Didn’t test those. I would like to know which other tests have you done. What about AMD ? We tested AMD 6380 with 1-2 GTX 980 GPUs, which gives about the same performance-to-price ratio as a 10 core Intel 2680v2 node with one GTX 980. The Intel node gives you a higher per-node performance, though. Unless you want to buy expensive FDR14 Infiniband, scaling across two or more of those nodes won’t be good (~0.65 parallel efficiency across 2, ~0.45 across 4 nodes using QDR infiniband), so I would advise against it and go for more sampling on single nodes. Well, that puzzles me. Why is it that you get poor performance ? Are you talking about pure CPU jobs over infiniband, or are you talking about CPU+GPU jobs over infiniband ? For a given network (e.g. QDR Infiniband), the scaling is better the lower the performance of the individual nodes. So for CPU-only nodes you will get a better scaling than for CPU+GPU nodes, which have a way higher per-node performance. How come you won’t get good performance if a great percentage of The performance is good, it is just that the parallel efficiency is not optimal for an MD system 100,000 atoms, meaning you do not get two times the performance on two nodes in parallel as compared to the aggregated performance of two individual runs. Bigger systems will have a better parallel efficiency. supercomputer centers in the world use InfiniBand ? And I'm sure lots of users here in the list use gromacs over Infiniband. I do, too :) But you get more trajectory for your money if you can wait and run on a single node. Carsten Thanks again. Best Regards, D Best, Carsten On 15 Jan 2015, at 17:35, David McGiven davidmcgiv...@gmail.com wrote: Dear Gromacs Users, We’ve got some funding to build a new cluster. It’s going to be used mainly for gromacs simulations (80% of the time). We run molecular dynamics simulations of transmembrane proteins inside a POPC lipid bilayer. In a typical system we have ~10 atoms, from which almost 1/3 correspond to water molecules. We employ usual conditions with PME for electorstatics and cutoffs for LJ interactions. I would like to hear your advice on which kind of machines are the best bang-for-the-buck for that kind of simulations. For instance : - Intel or AMD ? My understanding is that Intel is faster but expensive, and AMD is slower but cheaper. So at the end you almost get the same performance-per-buck. Right ? - Many CPUs/Cores x machine or less ? My understanding is that the more cores x machine the lesser the costs. One machine is always cheaper to buy and maintain than various. Plus maybe you can save the costs of Infiniband if you use large core densities ? - Should we invest in an Infiniband network to run jobs across multiple nodes ? Will the kind of simulations we run benefit from multiple nodes ? - Would we benefit from adding GPU’s to the cluster ? If so, which ones ? We now have a cluster with 48 and 64 AMD Opteron cores x machine (4 processors x machine) and we run our gromacs simulations there. We don’t use MPI because our jobs are mostly run in a single node. As I said, with 48 or 64 cores x simulation in a single machine. So far, we’re quite satisfied with the performance we get. Any advice will be greatly appreciated. Best Regards, D. -- Gromacs Users mailing list * Please search the archive at http
Re: [gmx-users] Cluster recommendations
Hi Carsten, Thanks for your answer. 2015-01-16 11:11 GMT+01:00 Carsten Kutzner ckut...@gwdg.de: Hi David, we are just finishing an evaluation to find out which is the optimal hardware for Gromacs setups. One of the input systems is an 80,000 atom membrane channel system and thus nearly exactly what you want to compute. The biggest benefit you will get by adding one or two consumer-class GPUs to your nodes (e.g. NVIDIA GTX 980). That will typically double your performace-to-price ratio. This is true for Intel as well as for AMD nodes, however the best ratio in our tests was observed with 10-core Intel CPUs (2670v2, 2680v2) in combination with a GTX 780Ti or 980, ideally two of those CPUs with two GPUs on a node. Was there a difference between 2670v2 (2.5 GHz) and 2680v2 (2.8 GHz) ? I'm wondering if those 0,3 GHz are significative. Or the 0,5 GHz compared to 2690v2 for the matter. There's a significative difference in price indeed. I'm also wondering if the performance would be better with 16 core Intels instead of 10 core. I.e E5-2698 v3. I would like to know which other tests have you done. What about AMD ? Unless you want to buy expensive FDR14 Infiniband, scaling across two or more of those nodes won’t be good (~0.65 parallel efficiency across 2, ~0.45 across 4 nodes using QDR infiniband), so I would advise against it and go for more sampling on single nodes. Well, that puzzles me. Why is it that you get poor performance ? Are you talking about pure CPU jobs over infiniband, or are you talking about CPU+GPU jobs over infiniband ? How come you won't get good performance if a great percentage of supercomputer centers in the world use InfiniBand ? And I'm sure lots of users here in the list use gromacs over Infiniband. Thanks again. Best Regards, D Best, Carsten On 15 Jan 2015, at 17:35, David McGiven davidmcgiv...@gmail.com wrote: Dear Gromacs Users, We’ve got some funding to build a new cluster. It’s going to be used mainly for gromacs simulations (80% of the time). We run molecular dynamics simulations of transmembrane proteins inside a POPC lipid bilayer. In a typical system we have ~10 atoms, from which almost 1/3 correspond to water molecules. We employ usual conditions with PME for electorstatics and cutoffs for LJ interactions. I would like to hear your advice on which kind of machines are the best bang-for-the-buck for that kind of simulations. For instance : - Intel or AMD ? My understanding is that Intel is faster but expensive, and AMD is slower but cheaper. So at the end you almost get the same performance-per-buck. Right ? - Many CPUs/Cores x machine or less ? My understanding is that the more cores x machine the lesser the costs. One machine is always cheaper to buy and maintain than various. Plus maybe you can save the costs of Infiniband if you use large core densities ? - Should we invest in an Infiniband network to run jobs across multiple nodes ? Will the kind of simulations we run benefit from multiple nodes ? - Would we benefit from adding GPU’s to the cluster ? If so, which ones ? We now have a cluster with 48 and 64 AMD Opteron cores x machine (4 processors x machine) and we run our gromacs simulations there. We don’t use MPI because our jobs are mostly run in a single node. As I said, with 48 or 64 cores x simulation in a single machine. So far, we’re quite satisfied with the performance we get. Any advice will be greatly appreciated. Best Regards, D. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Dr. Carsten Kutzner Max Planck Institute for Biophysical Chemistry Theoretical and Computational Biophysics Am Fassberg 11, 37077 Goettingen, Germany Tel. +49-551-2012313, Fax: +49-551-2012302 http://www.mpibpc.mpg.de/grubmueller/kutzner http://www.mpibpc.mpg.de/grubmueller/sppexa -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
[gmx-users] Cluster recommendations
Dear Gromacs Users, We’ve got some funding to build a new cluster. It’s going to be used mainly for gromacs simulations (80% of the time). We run molecular dynamics simulations of transmembrane proteins inside a POPC lipid bilayer. In a typical system we have ~10 atoms, from which almost 1/3 correspond to water molecules. We employ usual conditions with PME for electorstatics and cutoffs for LJ interactions. I would like to hear your advice on which kind of machines are the best bang-for-the-buck for that kind of simulations. For instance : - Intel or AMD ? My understanding is that Intel is faster but expensive, and AMD is slower but cheaper. So at the end you almost get the same performance-per-buck. Right ? - Many CPUs/Cores x machine or less ? My understanding is that the more cores x machine the lesser the costs. One machine is always cheaper to buy and maintain than various. Plus maybe you can save the costs of Infiniband if you use large core densities ? - Should we invest in an Infiniband network to run jobs across multiple nodes ? Will the kind of simulations we run benefit from multiple nodes ? - Would we benefit from adding GPU’s to the cluster ? If so, which ones ? We now have a cluster with 48 and 64 AMD Opteron cores x machine (4 processors x machine) and we run our gromacs simulations there. We don’t use MPI because our jobs are mostly run in a single node. As I said, with 48 or 64 cores x simulation in a single machine. So far, we’re quite satisfied with the performance we get. Any advice will be greatly appreciated. Best Regards, D. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What wentwrong ?
Thank you very much to all of you. That should explain the difference in performance. I'll also discuss it with a more gromacs-knowledgeable colleague of mine. Best Regards. 2014-09-06 8:58 GMT+02:00 Abhi Acharya abhi117acha...@gmail.com: Thank you Mark and Szilard for your replies. It gave more clarity on how the new gromacs works, especially in greater support for streamed computing. I hope David's problem is sorted too. :) Thanks again, Regards, Abhishek Acharya On Fri, Sep 5, 2014 at 10:45 PM, Szilárd Páll pall.szil...@gmail.com wrote: On Fri, Sep 5, 2014 at 6:40 PM, Abhishek Acharya abhi117acha...@gmail.com wrote: Dear Mark, Thank you for the insightful reply. In the manual for gromacs 5.0 it was mentioned that verlet scheme is better for GPU systems. More correctly, only the Verlet scheme supports GPU acceleration. The algorithms used by the group scheme are not appropriate for GPUs or other wide-SIMD accelerators. Does that mean that we should give up on the group scheme scheme, even though we get good performance compared to verlet? That's up to you to decide. The algorithms are different, the group scheme does not use a buffer by default, while the verlet scheme does and aims to control the drift (and keep it quite low by default). Future plan of removing group cut-off scheme indicates that it must have been associated with a high cost-benefit ratio. What makes you conclude that? The reasons are described here: http://www.gromacs.org/Documentation/Cut-off_schemes In very brief summary: i) the groups scheme is not suitable for accelerators and wide SIMD architectures ii) energy conservation = high performance penalty iii) inconvenient for high parallalelization as it increases load imbalance Cheers, -- Szilárd Could you please shed little light on this ? Thanks. Regards, Abhishek -Original Message- From: Mark Abraham mark.j.abra...@gmail.com Sent: 9/5/2014 7:57 PM To: Discussion list for GROMACS users gmx-us...@gromacs.org Subject: Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What wentwrong ? This cutoff-scheme difference is probably caused by using an .mdp file that does not specify the cutoff scheme, and the default changed in 5.0. grompp issued a note about this, if you go and check it. The change in the -npme choice is a direct consequence of this; the heuristics underlying the splitting choice approximately understand the relative performance characteristics of the two implementations, and you can see that in practice the reported PP/PME balance is decent in each case. There is indeed a large chunk of water (which you can see in group-scheme log files e.g. the line in the FLOP accounting that says NB VdW Elec. [W3-W3,F] dominates the cost), and David's neighbour list is unbuffered. This is indeed the regime where the group scheme might still out-perform the Verlet scheme (depending whether you value buffering in the neighbour list, which you generally should!). Mark On Fri, Sep 5, 2014 at 4:06 PM, Abhi Acharya abhi117acha...@gmail.com wrote: Hello, Is you system solvated with water molecules? The reason I ask is that, in case of the run with 4.6.5 you gromacs has used a group cut-off scheme, whereas 5.0 has used verlet scheme. For system with water molecules, group scheme gives better performance than verlet. For more check out: http://www.gromacs.org/Documentation/Cut-off_schemes Regards, Abhishek Acharya On Fri, Sep 5, 2014 at 7:28 PM, Carsten Kutzner ckut...@gwdg.de wrote: Hi, you might want to use g_tune_pme to find out the optimal number of PME nodes for 4.6 and 5.0. Carsten On 05 Sep 2014, at 15:39, David McGiven davidmcgiv...@gmail.com wrote: What is even more strange is that I tried with 10 pme nodes (mdrun -ntmpi 48 -v -c TEST_md.gro -npme 16), got a 15,8% performance loss and ns/day are very similar : 33 ns/day D. 2014-09-05 14:54 GMT+02:00 David McGiven davidmcgiv...@gmail.com : Hi Abhi, Yes I noticed that imbalance but I thought gromacs knew better than the user how to split PP/PME!! How is it possible that 4.6.5 guesses better than 5.0 ? Anyway, I tried : mdrun -nt 48 -v -c test.out Exits with an error You need to explicitly specify the number of MPI threads (-ntmpi) when using separate PME ranks Then: mdrun -ntmpi 48 -v -c TEST_md.gro -npme 12 Then again 35 ns/day with the warning : NOTE: 8.5 % performance was lost because the PME ranks had less work to do than the PP ranks. You might want to decrease the number of PME ranks or decrease the cut-off and the grid spacing
[gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What went wrong ?
Dear Gromacs users, I just compiled gromacs 5.0 with the same compiler (gcc 4.7.2), same OS (RHEL 6) same configuration options and basically everything than my previous gromacs 4.6.5 compilation and when doing one of our typical simulations, I get worst performance. 4.6.5 does 45 ns/day 5.0 does 35 ns/day Do you have any idea of what could be happening ? Thanks. Best Regards, D. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What went wrong ?
Command line in both cases is : 1st : grompp -f grompp.mdp -c conf.gro -n index.ndx 2nd :mdrun -nt 48 -v -c test.out Log file you mean the standard output/error ? Attached to the email ? Thanks 2014-09-05 12:30 GMT+02:00 Szilárd Páll pall.szil...@gmail.com: Please post the command lines you used to invoke mdrun as well as the log files of the runs you are comparing. Cheers, -- Szilárd On Fri, Sep 5, 2014 at 12:10 PM, David McGiven davidmcgiv...@gmail.com wrote: Dear Gromacs users, I just compiled gromacs 5.0 with the same compiler (gcc 4.7.2), same OS (RHEL 6) same configuration options and basically everything than my previous gromacs 4.6.5 compilation and when doing one of our typical simulations, I get worst performance. 4.6.5 does 45 ns/day 5.0 does 35 ns/day Do you have any idea of what could be happening ? Thanks. Best Regards, D. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What went wrong ?
Thanks Szilard, here it goes! : 4.6.5 : http://pastebin.com/nqBn3FKs 5.0 : http://pastebin.com/kR4ntHtK 2014-09-05 12:47 GMT+02:00 Szilárd Páll pall.szil...@gmail.com: mdrun writes a log file, named md.log by default, which contains among other things results of hardware detection and performance measurements. The list does not accept attachments, please upload it somewhere (dropbox, pastebin, etc.) and post a link. Cheers, -- Szilárd On Fri, Sep 5, 2014 at 12:37 PM, David McGiven davidmcgiv...@gmail.com wrote: Command line in both cases is : 1st : grompp -f grompp.mdp -c conf.gro -n index.ndx 2nd :mdrun -nt 48 -v -c test.out Log file you mean the standard output/error ? Attached to the email ? Thanks 2014-09-05 12:30 GMT+02:00 Szilárd Páll pall.szil...@gmail.com: Please post the command lines you used to invoke mdrun as well as the log files of the runs you are comparing. Cheers, -- Szilárd On Fri, Sep 5, 2014 at 12:10 PM, David McGiven davidmcgiv...@gmail.com wrote: Dear Gromacs users, I just compiled gromacs 5.0 with the same compiler (gcc 4.7.2), same OS (RHEL 6) same configuration options and basically everything than my previous gromacs 4.6.5 compilation and when doing one of our typical simulations, I get worst performance. 4.6.5 does 45 ns/day 5.0 does 35 ns/day Do you have any idea of what could be happening ? Thanks. Best Regards, D. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What went wrong ?
Hi Abhi, Yes I noticed that imbalance but I thought gromacs knew better than the user how to split PP/PME!! How is it possible that 4.6.5 guesses better than 5.0 ? Anyway, I tried : mdrun -nt 48 -v -c test.out Exits with an error You need to explicitly specify the number of MPI threads (-ntmpi) when using separate PME ranks Then: mdrun -ntmpi 48 -v -c TEST_md.gro -npme 12 Then again 35 ns/day with the warning : NOTE: 8.5 % performance was lost because the PME ranks had less work to do than the PP ranks. You might want to decrease the number of PME ranks or decrease the cut-off and the grid spacing. I don't know much about Gromacs so I am puzzled. 2014-09-05 14:32 GMT+02:00 Abhi Acharya abhi117acha...@gmail.com: Hello, From the log files it is clear that out of 48 cores, the 5.0 run had 8 cores allocated to PME while the 4.6.5 run had 12 cores. This seems to have caused a greater load imbalance in case of the 5.0 run. If you notice the last table in both .mdp files, you will notice that the PME spread/gather wall time values for 5.0 is more than double the wall time value in case of the 4.6.5. Try running the simulation by explicitly setting the -npme flag as 12. Regards, Abhishek Acharya On Fri, Sep 5, 2014 at 4:43 PM, David McGiven davidmcgiv...@gmail.com wrote: Thanks Szilard, here it goes! : 4.6.5 : http://pastebin.com/nqBn3FKs 5.0 : http://pastebin.com/kR4ntHtK 2014-09-05 12:47 GMT+02:00 Szilárd Páll pall.szil...@gmail.com: mdrun writes a log file, named md.log by default, which contains among other things results of hardware detection and performance measurements. The list does not accept attachments, please upload it somewhere (dropbox, pastebin, etc.) and post a link. Cheers, -- Szilárd On Fri, Sep 5, 2014 at 12:37 PM, David McGiven davidmcgiv...@gmail.com wrote: Command line in both cases is : 1st : grompp -f grompp.mdp -c conf.gro -n index.ndx 2nd :mdrun -nt 48 -v -c test.out Log file you mean the standard output/error ? Attached to the email ? Thanks 2014-09-05 12:30 GMT+02:00 Szilárd Páll pall.szil...@gmail.com: Please post the command lines you used to invoke mdrun as well as the log files of the runs you are comparing. Cheers, -- Szilárd On Fri, Sep 5, 2014 at 12:10 PM, David McGiven davidmcgiv...@gmail.com wrote: Dear Gromacs users, I just compiled gromacs 5.0 with the same compiler (gcc 4.7.2), same OS (RHEL 6) same configuration options and basically everything than my previous gromacs 4.6.5 compilation and when doing one of our typical simulations, I get worst performance. 4.6.5 does 45 ns/day 5.0 does 35 ns/day Do you have any idea of what could be happening ? Thanks. Best Regards, D. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http
Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What went wrong ?
What is even more strange is that I tried with 10 pme nodes (mdrun -ntmpi 48 -v -c TEST_md.gro -npme 16), got a 15,8% performance loss and ns/day are very similar : 33 ns/day D. 2014-09-05 14:54 GMT+02:00 David McGiven davidmcgiv...@gmail.com: Hi Abhi, Yes I noticed that imbalance but I thought gromacs knew better than the user how to split PP/PME!! How is it possible that 4.6.5 guesses better than 5.0 ? Anyway, I tried : mdrun -nt 48 -v -c test.out Exits with an error You need to explicitly specify the number of MPI threads (-ntmpi) when using separate PME ranks Then: mdrun -ntmpi 48 -v -c TEST_md.gro -npme 12 Then again 35 ns/day with the warning : NOTE: 8.5 % performance was lost because the PME ranks had less work to do than the PP ranks. You might want to decrease the number of PME ranks or decrease the cut-off and the grid spacing. I don't know much about Gromacs so I am puzzled. 2014-09-05 14:32 GMT+02:00 Abhi Acharya abhi117acha...@gmail.com: Hello, From the log files it is clear that out of 48 cores, the 5.0 run had 8 cores allocated to PME while the 4.6.5 run had 12 cores. This seems to have caused a greater load imbalance in case of the 5.0 run. If you notice the last table in both .mdp files, you will notice that the PME spread/gather wall time values for 5.0 is more than double the wall time value in case of the 4.6.5. Try running the simulation by explicitly setting the -npme flag as 12. Regards, Abhishek Acharya On Fri, Sep 5, 2014 at 4:43 PM, David McGiven davidmcgiv...@gmail.com wrote: Thanks Szilard, here it goes! : 4.6.5 : http://pastebin.com/nqBn3FKs 5.0 : http://pastebin.com/kR4ntHtK 2014-09-05 12:47 GMT+02:00 Szilárd Páll pall.szil...@gmail.com: mdrun writes a log file, named md.log by default, which contains among other things results of hardware detection and performance measurements. The list does not accept attachments, please upload it somewhere (dropbox, pastebin, etc.) and post a link. Cheers, -- Szilárd On Fri, Sep 5, 2014 at 12:37 PM, David McGiven davidmcgiv...@gmail.com wrote: Command line in both cases is : 1st : grompp -f grompp.mdp -c conf.gro -n index.ndx 2nd :mdrun -nt 48 -v -c test.out Log file you mean the standard output/error ? Attached to the email ? Thanks 2014-09-05 12:30 GMT+02:00 Szilárd Páll pall.szil...@gmail.com: Please post the command lines you used to invoke mdrun as well as the log files of the runs you are comparing. Cheers, -- Szilárd On Fri, Sep 5, 2014 at 12:10 PM, David McGiven davidmcgiv...@gmail.com wrote: Dear Gromacs users, I just compiled gromacs 5.0 with the same compiler (gcc 4.7.2), same OS (RHEL 6) same configuration options and basically everything than my previous gromacs 4.6.5 compilation and when doing one of our typical simulations, I get worst performance. 4.6.5 does 45 ns/day 5.0 does 35 ns/day Do you have any idea of what could be happening ? Thanks. Best Regards, D. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http