[gmx-users] Performance of NVIDIA GTX980 in PCI-e 3.0 x8 or x16 slots ?

2015-06-11 Thread David McGiven
Dear Gromacs Users,

We're finally buying some Intel E52650 servers + NVIDIA GTX980 cards.

However, there's some servers that come with only PCI-e 3.0 x8 slots and
others with x16 slots.

Do you think this is relevant for gromacs performance ? And if so, how much
relevant ?

Thanks in advance.

Cheers,
David.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Performance of NVIDIA GTX980 in PCI-e 3.0 x8 or x16 slots ?

2015-06-11 Thread David McGiven
Hey Mirco,

Your 1-3% claim is based on the webpage you linked ?

Is it reliable to compare GPU performances for gromacs with those of 3D
videogames ?

Thanks!

2015-06-11 13:21 GMT+02:00 Mirco Wahab mirco.wa...@chemie.tu-freiberg.de:

 On 11.06.2015 13:08, David McGiven wrote:

 We're finally buying some Intel E52650 servers + NVIDIA GTX980 cards.
 However, there's some servers that come with only PCI-e 3.0 x8 slots and
 others with x16 slots.
 Do you think this is relevant for gromacs performance ? And if so, how
 much
 relevant ?


 It's more important to select PCIe 3.0 mode. Then, the difference
 between 16x and 8x is imho very low (1-3%).

 M.


 P.S.:
 http://www.techpowerup.com/reviews/NVIDIA/GTX_980_PCI-Express_Scaling/

 --
 Gromacs Users mailing list

 * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
 posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit
 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
 send a mail to gmx-users-requ...@gromacs.org.

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


[gmx-users] GMX 5.0 compilation across different platforms

2015-02-27 Thread David McGiven
Dear All,

I would like to statically compile GROMACS 5 in an Intel Xeon X3430 machine
with gcc4.7 (cluster front node) BUT run it on an Intel Xeon E5-2650V2
machine (cluster compute node).

Would that be possible ? And if so, how should I do it ?

Haven't found it on the Installation_Instructions webpage.

Thanks in advance.

BR,
D.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] NVIDIA GTX cards in Rackable servers, how do you do it ?

2015-02-24 Thread David McGiven
Hey Harry,

Thanks for the caveat. Carsten Kutzner posted these results a few days ago.
This is what he said :

I never benchmarked 64-core AMD nodes with GPUs. With a 80 k atoms test
 system using a 2 fs time step I get
 24 ns/d on 64 AMD   cores 6272
 16 ns/d on 32 AMD   cores 6380
 36 ns/d on 32 AMD   cores 6380   with 1x GTX 980
 40 ns/d on 32 AMD   cores 6380   with 2x GTX 980
 27 ns/d on 20 Intel cores 2680v2
 52 ns/d on 20 Intel cores 2680v2 with 1x GTX 980
 62 ns/d on 20 Intel cores 2680v2 with 2x GTX 980


I think 20 Intel cores means 2 x 10 cores each.

But Szilard just mentioned in this same thread :

If you can afford them get the 14/16 or 18 core v3 Haswells, those are
 *really* fast, but a pair can cost as much as a decent car.



I know for sure gromacs escalates VERY well on 4 x 16 cores latests AMD
(Interlagos, Bulldozer, etc.) machines. But have no experience with Intel
Xeon.

Let's see what others can say.

BR,
D

2015-02-24 13:17 GMT+01:00 Harry Mark Greenblatt 
harry.greenbl...@weizmann.ac.il:

 BSD

 Dear David,

   We did some tests with Gromacs and other programs on CPU's with core
 counts up to 16 per socket, and found that after about 12 cores
 jobs/threads begin to interfere with each other.  In other words there was
 a performace penalty when using core counts above 12.  I don't have the
 details in front of me, but you should  at the very least get a test
 machine and try running your simulations for short periods with 10, 12, 14,
 16 and 18 cores in use to see how Gromacs behaves with these processors
 (unless someone has done these tests, and can confirm that Gromacs has no
 issues with 16 or 18 core cpu's).

 Harry


 On Feb 24, 2015, at 1:32 PM, David McGiven wrote:

 Hi Szilard,

 Thank you very much for your great advice.

 2015-02-20 19:03 GMT+01:00 Szilárd Páll pall.szil...@gmail.commailto:
 pall.szil...@gmail.com:

 On Fri, Feb 20, 2015 at 2:17 PM, David McGiven davidmcgiv...@gmail.com
 mailto:davidmcgiv...@gmail.com
 wrote:
 Dear Gromacs users and developers,

 We are thinking about buying a new cluster of ten or twelve 1U/2U
 machines
 with 2 Intel Xeon CPU's 8-12 cores each. Some of the 2600v2 or v3 series.
 Not yet clear the details, we'll see.

 If you can afford them get the 14/16 or 18 core v3 Haswells, those are
 *really* fast, but a pair can cost as much as a decent car.

 Get IVB (v2) if it saves you a decent amount of money compared to v3.
 The AVX2 with FMA of the Haswell chips is great, but if you run
 GROMACS with GPUs on them my guess is that a higher frequency v2 will
 be more advantageous than the v3's AVX2 support. Won't swear on this
 as I have not tested thoroughly.


 According to an email exchange I had with Carsten Kutzner, for the kind of
 simulations we would like to run (see below), lower frequency v2's give
 better performance-to-price ratio.

 For instance, we can get from a national reseller :

 2U server (supermicro rebranded I guess)
 2 x E5-2699V3 18c 2,3Ghz
 64 GB DDR4
 2 x GTX980 (certified for the server)
 -
 13.400 EUR (sans VAT)


 2U server (supermicro rebranded I guess)
 2 x E5-2695V2 12c 2,4 Ghz
 64 GB DDR3
 2 x GTX980 (certified for the server)
 -
 9.140 EUR (sans VAT)

 Does that qualify as saving a decent amount of money to go for the V2 ? I
 don't think so, also because we care about rack space. Less servers but
 potent ones. The latests haswells are way too overpriced for us.

 We want to run molecular dynamics simulations of transmembrane proteins
 inside a POPC lipid bilayer, in a system with ~10 atoms, from which
 almost 1/3 correspond to water molecules and employing usual conditions
 with PME for electorstatics and cutoffs for LJ interactions.

 I think we'll go for the V3 version.

 I've been told in this list that NVIDIA GTX offer the best
 performance/price ratio for gromacs 5.0.

 Yes, that is the case.

 However, I am wondering ... How do you guys use the GTX cards in rackable
 servers ?

 GTX cards are consummer grade, for personal workstations, gaming, and so
 on
 and it's nearly impossible to find any servers manufacturer like HP,
 Dell,
 SuperMicro, etc. to certify that those cards will function properly on
 their servers.

 Certification can be an issue - unless you buy many and you can cut a
 deal with a company. There are some companies that do certify servers,
 but AFAIK most/all are US-based. I won't do public a long
 advertisement here, but you can find many names if you browse NVIDIA's
 GPU computing site (and as a matter of fact the AMBER GPU site is
 quite helpful in this respect too).

 You can consider getting vanilla server nodes and plug the GTX cards
 in yourself. In general, I can recommend Supermicro, they have pretty
 good value servers from 1 to 4U. The easiest is to use the latter
 because GTX cards will just fit vertically, but it will be a serious
 waste of rack-space.

 With a bit of tinkering you may be able to get
 GTX cards into 3U, but you'll either need cards with connectors

Re: [gmx-users] NVIDIA GTX cards in Rackable servers, how do you do it ?

2015-02-24 Thread David McGiven
Hi Szilard,

Thank you very much for your great advice.

2015-02-20 19:03 GMT+01:00 Szilárd Páll pall.szil...@gmail.com:

 On Fri, Feb 20, 2015 at 2:17 PM, David McGiven davidmcgiv...@gmail.com
 wrote:
  Dear Gromacs users and developers,
 
  We are thinking about buying a new cluster of ten or twelve 1U/2U
 machines
  with 2 Intel Xeon CPU's 8-12 cores each. Some of the 2600v2 or v3 series.
  Not yet clear the details, we'll see.

 If you can afford them get the 14/16 or 18 core v3 Haswells, those are
 *really* fast, but a pair can cost as much as a decent car.

 Get IVB (v2) if it saves you a decent amount of money compared to v3.
 The AVX2 with FMA of the Haswell chips is great, but if you run
 GROMACS with GPUs on them my guess is that a higher frequency v2 will
 be more advantageous than the v3's AVX2 support. Won't swear on this
 as I have not tested thoroughly.


According to an email exchange I had with Carsten Kutzner, for the kind of
simulations we would like to run (see below), lower frequency v2's give
better performance-to-price ratio.

For instance, we can get from a national reseller :

2U server (supermicro rebranded I guess)
2 x E5-2699V3 18c 2,3Ghz
64 GB DDR4
2 x GTX980 (certified for the server)
-
13.400 EUR (sans VAT)


2U server (supermicro rebranded I guess)
2 x E5-2695V2 12c 2,4 Ghz
64 GB DDR3
2 x GTX980 (certified for the server)
-
9.140 EUR (sans VAT)

Does that qualify as saving a decent amount of money to go for the V2 ? I
don't think so, also because we care about rack space. Less servers but
potent ones. The latests haswells are way too overpriced for us.

We want to run molecular dynamics simulations of transmembrane proteins
inside a POPC lipid bilayer, in a system with ~10 atoms, from which
almost 1/3 correspond to water molecules and employing usual conditions
with PME for electorstatics and cutoffs for LJ interactions.

I think we'll go for the V3 version.

 I've been told in this list that NVIDIA GTX offer the best
  performance/price ratio for gromacs 5.0.

 Yes, that is the case.

  However, I am wondering ... How do you guys use the GTX cards in rackable
  servers ?
 
  GTX cards are consummer grade, for personal workstations, gaming, and so
 on
  and it's nearly impossible to find any servers manufacturer like HP,
 Dell,
  SuperMicro, etc. to certify that those cards will function properly on
  their servers.

 Certification can be an issue - unless you buy many and you can cut a
 deal with a company. There are some companies that do certify servers,
 but AFAIK most/all are US-based. I won't do public a long
 advertisement here, but you can find many names if you browse NVIDIA's
 GPU computing site (and as a matter of fact the AMBER GPU site is
 quite helpful in this respect too).

 You can consider getting vanilla server nodes and plug the GTX cards
 in yourself. In general, I can recommend Supermicro, they have pretty
 good value servers from 1 to 4U. The easiest is to use the latter
 because GTX cards will just fit vertically, but it will be a serious
 waste of rack-space.

With a bit of tinkering you may be able to get
 GTX cards into 3U, but you'll either need cards with connectors on the
 back or 90 deg angled 4-pin PCIE power cables. Otherwise you can only
 fit the cards with PCIE raisers and I have no experience with that
 setup, but I know some build denser machines with GTX cards.

 Cheer,

 --
 Szilárd

  What are your views about this ?
 
  Thanks.
 
  Best Regards
  --
  Gromacs Users mailing list
 
  * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
 posting!
 
  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
 
  * For (un)subscribe requests visit
  https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
 send a mail to gmx-users-requ...@gromacs.org.
 --
 Gromacs Users mailing list

 * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
 posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit
 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
 send a mail to gmx-users-requ...@gromacs.org.

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] NVIDIA GTX cards in Rackable servers, how do you do it ?

2015-02-24 Thread David McGiven
2015-02-24 15:46 GMT+01:00 Szilárd Páll pall.szil...@gmail.com:


 
  Perhaps he has seen some real results that do not show issues at 16 or
 18 cores/socket, in which case they would be advantageous, if one can
 afford them.  I am only going on what the manager of our cluster mentioned
 to me in his tests.  But his tests were based on many different software
 packages, so perhaps Gromacs is less/not affected.

 OK, that's an entirely different claim than the one you made
 initially. I dare to say that it is dangerous to mix performance
 observations of many software packages - especially with that of
 GROMACS.


Totally agree.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Cluster recommendations

2015-02-20 Thread David McGiven
Hi Carsten,

Sorry I just saw your message today. Thank you very much for the details.

Cheers

2015-02-02 14:11 GMT+01:00 Carsten Kutzner ckut...@gwdg.de:

 Hi David,

 On 22 Jan 2015, at 18:01, David McGiven davidmcgiv...@gmail.com wrote:

  Hey Karsten,
 
  Just another question. What do you think will be the performance
 difference
  between two gromacs runs with a ~100k atoms system like the one I
 mentioned
  on my first email :
 
  - 1 server with 4 AMD processors, 16 cores each (64 cores) with no GPU
  - 1 server with 4 AMD processors, 16 cores each (64 cores) with one GTX
 980
  GPU
  - 1 server with 2 Intel processors, 10 cores each (20 cores) like the
 ones
  you mentioned, with one or two GTX 980 GPU.
 
  I'm not interested in exact performance numbers, I just need to
 understand
  the logistics behind the CPU/GPU combinations in order to make an
  inteligent cluster purchase.
 I never benchmarked 64-core AMD nodes with GPUs. With a 80 k atoms test
 system using a 2 fs time step I get
 24 ns/d on 64 AMD   cores 6272
 16 ns/d on 32 AMD   cores 6380
 36 ns/d on 32 AMD   cores 6380   with 1x GTX 980
 40 ns/d on 32 AMD   cores 6380   with 2x GTX 980
 27 ns/d on 20 Intel cores 2680v2
 52 ns/d on 20 Intel cores 2680v2 with 1x GTX 980
 62 ns/d on 20 Intel cores 2680v2 with 2x GTX 980

 So unless you can get the AMD nodes very cheap, probably the 20-core
 Intel nodes with 1 or 2 GPUs will give you the best performance and the
 best
 performance/price.

 Best,
   Carsten

 
  Thanks again.
 
  Best,
  D
 
 
  2015-01-16 14:46 GMT+01:00 Carsten Kutzner ckut...@gwdg.de:
 
  Hi David,
 
  On 16 Jan 2015, at 12:28, David McGiven davidmcgiv...@gmail.com
 wrote:
 
  Hi Carsten,
 
  Thanks for your answer.
 
  2015-01-16 11:11 GMT+01:00 Carsten Kutzner ckut...@gwdg.de:
 
  Hi David,
 
  we are just finishing an evaluation to find out which is the optimal
  hardware for Gromacs setups. One of the input systems is an 80,000
 atom
  membrane channel system and thus nearly exactly what you want
  to compute.
 
  The biggest benefit you will get by adding one or two consumer-class
  GPUs
  to your nodes (e.g. NVIDIA GTX 980). That will typically double your
  performace-to-price ratio. This is true for Intel as well as for AMD
  nodes, however the best ratio in our tests was observed with 10-core
  Intel CPUs (2670v2, 2680v2) in combination with a GTX 780Ti or 980,
  ideally two of those CPUs with two GPUs on a node.
 
 
  Was there a difference between 2670v2 (2.5 GHz) and 2680v2  (2.8 GHz) ?
  I'm
  wondering if those 0,3 GHz are significative. Or the 0,5 GHz compared
 to
  2690v2 for the matter. There’s a significative difference in price
  indeed.
  Usually the percent improvement for Gromacs performance is not as much
  as the percent improvement in clock speed, so the cheaper ones will
  give you a higher performance-to-price ratio.
 
 
  I'm also wondering if the performance would be better with 16 core
 Intels
  instead of 10 core. I.e E5-2698 v3.
  Didn’t test those.
 
 
  I would like to know which other tests have you done. What about AMD ?
  We tested AMD 6380 with 1-2 GTX 980 GPUs, which gives about the same
  performance-to-price ratio as a 10 core Intel 2680v2 node with one GTX
 980.
  The Intel node gives you a higher per-node performance, though.
 
 
  Unless you want to buy expensive FDR14 Infiniband, scaling across two
  or more of those nodes won’t be good (~0.65 parallel efficiency across
  2,
  ~0.45 across 4 nodes using QDR infiniband), so I would advise against
  it and go for more sampling on single nodes.
 
 
  Well, that puzzles me. Why is it that you get poor performance ? Are
 you
  talking about pure CPU jobs over infiniband, or are you talking about
  CPU+GPU jobs over infiniband ?
  For a given network (e.g. QDR Infiniband), the scaling is better the
 lower
  the performance of the individual nodes. So for CPU-only nodes you
  will get a better scaling than for CPU+GPU nodes, which have a way
 higher
  per-node performance.
 
  How come you won’t get good performance if a great percentage of
  The performance is good, it is just that the parallel efficiency is
  not optimal for an MD system 100,000 atoms, meaning you do not get two
  times the performance on two nodes in parallel as compared to the
  aggregated performance of two individual runs.
  Bigger systems will have a better parallel efficiency.
 
  supercomputer centers in the world use InfiniBand ? And I'm sure lots
 of
  users here in the list use gromacs over Infiniband.
  I do, too :)
  But you get more trajectory for your money if you can wait and run on
  a single node.
 
  Carsten
 
 
  Thanks again.
 
  Best Regards,
  D
 
 
  Best,
  Carsten
 
 
 
 
  On 15 Jan 2015, at 17:35, David McGiven davidmcgiv...@gmail.com
  wrote:
 
  Dear Gromacs Users,
 
  We’ve got some funding to build a new cluster. It’s going to be used
  mainly
  for gromacs simulations (80% of the time). We run molecular dynamics
  simulations

[gmx-users] NVIDIA GTX cards in Rackable servers, how do you do it ?

2015-02-20 Thread David McGiven
Dear Gromacs users and developers,

We are thinking about buying a new cluster of ten or twelve 1U/2U machines
with 2 Intel Xeon CPU's 8-12 cores each. Some of the 2600v2 or v3 series.
Not yet clear the details, we'll see.

I've been told in this list that NVIDIA GTX offer the best
performance/price ratio for gromacs 5.0.

However, I am wondering ... How do you guys use the GTX cards in rackable
servers ?

GTX cards are consummer grade, for personal workstations, gaming, and so on
and it's nearly impossible to find any servers manufacturer like HP, Dell,
SuperMicro, etc. to certify that those cards will function properly on
their servers.

What are your views about this ?

Thanks.

Best Regards
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Cluster recommendations

2015-01-22 Thread David McGiven
Hey Karsten,

Just another question. What do you think will be the performance difference
between two gromacs runs with a ~100k atoms system like the one I mentioned
on my first email :

- 1 server with 4 AMD processors, 16 cores each (64 cores) with no GPU
- 1 server with 4 AMD processors, 16 cores each (64 cores) with one GTX 980
GPU
- 1 server with 2 Intel processors, 10 cores each (20 cores) like the ones
you mentioned, with one or two GTX 980 GPU.

I'm not interested in exact performance numbers, I just need to understand
the logistics behind the CPU/GPU combinations in order to make an
inteligent cluster purchase.

Thanks again.

Best,
D


2015-01-16 14:46 GMT+01:00 Carsten Kutzner ckut...@gwdg.de:

 Hi David,

 On 16 Jan 2015, at 12:28, David McGiven davidmcgiv...@gmail.com wrote:

  Hi Carsten,
 
  Thanks for your answer.
 
  2015-01-16 11:11 GMT+01:00 Carsten Kutzner ckut...@gwdg.de:
 
  Hi David,
 
  we are just finishing an evaluation to find out which is the optimal
  hardware for Gromacs setups. One of the input systems is an 80,000 atom
  membrane channel system and thus nearly exactly what you want
  to compute.
 
  The biggest benefit you will get by adding one or two consumer-class
 GPUs
  to your nodes (e.g. NVIDIA GTX 980). That will typically double your
  performace-to-price ratio. This is true for Intel as well as for AMD
  nodes, however the best ratio in our tests was observed with 10-core
  Intel CPUs (2670v2, 2680v2) in combination with a GTX 780Ti or 980,
  ideally two of those CPUs with two GPUs on a node.
 
 
  Was there a difference between 2670v2 (2.5 GHz) and 2680v2  (2.8 GHz) ?
 I'm
  wondering if those 0,3 GHz are significative. Or the 0,5 GHz compared to
  2690v2 for the matter. There’s a significative difference in price
 indeed.
 Usually the percent improvement for Gromacs performance is not as much
 as the percent improvement in clock speed, so the cheaper ones will
 give you a higher performance-to-price ratio.

 
  I'm also wondering if the performance would be better with 16 core Intels
  instead of 10 core. I.e E5-2698 v3.
 Didn’t test those.

 
  I would like to know which other tests have you done. What about AMD ?
 We tested AMD 6380 with 1-2 GTX 980 GPUs, which gives about the same
 performance-to-price ratio as a 10 core Intel 2680v2 node with one GTX 980.
 The Intel node gives you a higher per-node performance, though.

 
  Unless you want to buy expensive FDR14 Infiniband, scaling across two
  or more of those nodes won’t be good (~0.65 parallel efficiency across
 2,
  ~0.45 across 4 nodes using QDR infiniband), so I would advise against
  it and go for more sampling on single nodes.
 
 
  Well, that puzzles me. Why is it that you get poor performance ? Are you
  talking about pure CPU jobs over infiniband, or are you talking about
  CPU+GPU jobs over infiniband ?
 For a given network (e.g. QDR Infiniband), the scaling is better the lower
 the performance of the individual nodes. So for CPU-only nodes you
 will get a better scaling than for CPU+GPU nodes, which have a way higher
 per-node performance.

  How come you won’t get good performance if a great percentage of
 The performance is good, it is just that the parallel efficiency is
 not optimal for an MD system 100,000 atoms, meaning you do not get two
 times the performance on two nodes in parallel as compared to the
 aggregated performance of two individual runs.
 Bigger systems will have a better parallel efficiency.

  supercomputer centers in the world use InfiniBand ? And I'm sure lots of
  users here in the list use gromacs over Infiniband.
 I do, too :)
 But you get more trajectory for your money if you can wait and run on
 a single node.

 Carsten

 
  Thanks again.
 
  Best Regards,
  D
 
 
  Best,
   Carsten
 
 
 
 
  On 15 Jan 2015, at 17:35, David McGiven davidmcgiv...@gmail.com
 wrote:
 
  Dear Gromacs Users,
 
  We’ve got some funding to build a new cluster. It’s going to be used
  mainly
  for gromacs simulations (80% of the time). We run molecular dynamics
  simulations of transmembrane proteins inside a POPC lipid bilayer. In a
  typical system we have ~10 atoms, from which almost 1/3 correspond
 to
  water molecules. We employ usual conditions with PME for electorstatics
  and
  cutoffs for LJ interactions.
 
  I would like to hear your advice on which kind of machines are the best
  bang-for-the-buck for that kind of simulations. For instance :
 
  - Intel or AMD ? My understanding is that Intel is faster but
 expensive,
  and AMD is slower but cheaper. So at the end you almost get the same
  performance-per-buck. Right ?
 
  - Many CPUs/Cores x machine or less ? My understanding is that the more
  cores x machine the lesser the costs. One machine is always cheaper to
  buy
  and maintain than various. Plus maybe you can save the costs of
  Infiniband
  if you use large core densities ?
 
  - Should we invest in an Infiniband network to run jobs across multiple
  nodes

Re: [gmx-users] Cluster recommendations

2015-01-22 Thread David McGiven
Sorry where it says between two gromacs runs I must have said three
gromacs runs. One for each combination of cpu/gpu.

2015-01-22 18:01 GMT+01:00 David McGiven davidmcgiv...@gmail.com:

 Hey Karsten,

 Just another question. What do you think will be the performance
 difference between two gromacs runs with a ~100k atoms system like the one
 I mentioned on my first email :

 - 1 server with 4 AMD processors, 16 cores each (64 cores) with no GPU
 - 1 server with 4 AMD processors, 16 cores each (64 cores) with one GTX
 980 GPU
 - 1 server with 2 Intel processors, 10 cores each (20 cores) like the ones
 you mentioned, with one or two GTX 980 GPU.

 I'm not interested in exact performance numbers, I just need to understand
 the logistics behind the CPU/GPU combinations in order to make an
 inteligent cluster purchase.

 Thanks again.

 Best,
 D


 2015-01-16 14:46 GMT+01:00 Carsten Kutzner ckut...@gwdg.de:

 Hi David,

 On 16 Jan 2015, at 12:28, David McGiven davidmcgiv...@gmail.com wrote:

  Hi Carsten,
 
  Thanks for your answer.
 
  2015-01-16 11:11 GMT+01:00 Carsten Kutzner ckut...@gwdg.de:
 
  Hi David,
 
  we are just finishing an evaluation to find out which is the optimal
  hardware for Gromacs setups. One of the input systems is an 80,000 atom
  membrane channel system and thus nearly exactly what you want
  to compute.
 
  The biggest benefit you will get by adding one or two consumer-class
 GPUs
  to your nodes (e.g. NVIDIA GTX 980). That will typically double your
  performace-to-price ratio. This is true for Intel as well as for AMD
  nodes, however the best ratio in our tests was observed with 10-core
  Intel CPUs (2670v2, 2680v2) in combination with a GTX 780Ti or 980,
  ideally two of those CPUs with two GPUs on a node.
 
 
  Was there a difference between 2670v2 (2.5 GHz) and 2680v2  (2.8 GHz) ?
 I'm
  wondering if those 0,3 GHz are significative. Or the 0,5 GHz compared to
  2690v2 for the matter. There’s a significative difference in price
 indeed.
 Usually the percent improvement for Gromacs performance is not as much
 as the percent improvement in clock speed, so the cheaper ones will
 give you a higher performance-to-price ratio.

 
  I'm also wondering if the performance would be better with 16 core
 Intels
  instead of 10 core. I.e E5-2698 v3.
 Didn’t test those.

 
  I would like to know which other tests have you done. What about AMD ?
 We tested AMD 6380 with 1-2 GTX 980 GPUs, which gives about the same
 performance-to-price ratio as a 10 core Intel 2680v2 node with one GTX
 980.
 The Intel node gives you a higher per-node performance, though.

 
  Unless you want to buy expensive FDR14 Infiniband, scaling across two
  or more of those nodes won’t be good (~0.65 parallel efficiency across
 2,
  ~0.45 across 4 nodes using QDR infiniband), so I would advise against
  it and go for more sampling on single nodes.
 
 
  Well, that puzzles me. Why is it that you get poor performance ? Are you
  talking about pure CPU jobs over infiniband, or are you talking about
  CPU+GPU jobs over infiniband ?
 For a given network (e.g. QDR Infiniband), the scaling is better the lower
 the performance of the individual nodes. So for CPU-only nodes you
 will get a better scaling than for CPU+GPU nodes, which have a way higher
 per-node performance.

  How come you won’t get good performance if a great percentage of
 The performance is good, it is just that the parallel efficiency is
 not optimal for an MD system 100,000 atoms, meaning you do not get two
 times the performance on two nodes in parallel as compared to the
 aggregated performance of two individual runs.
 Bigger systems will have a better parallel efficiency.

  supercomputer centers in the world use InfiniBand ? And I'm sure lots of
  users here in the list use gromacs over Infiniband.
 I do, too :)
 But you get more trajectory for your money if you can wait and run on
 a single node.

 Carsten

 
  Thanks again.
 
  Best Regards,
  D
 
 
  Best,
   Carsten
 
 
 
 
  On 15 Jan 2015, at 17:35, David McGiven davidmcgiv...@gmail.com
 wrote:
 
  Dear Gromacs Users,
 
  We’ve got some funding to build a new cluster. It’s going to be used
  mainly
  for gromacs simulations (80% of the time). We run molecular dynamics
  simulations of transmembrane proteins inside a POPC lipid bilayer. In
 a
  typical system we have ~10 atoms, from which almost 1/3
 correspond to
  water molecules. We employ usual conditions with PME for
 electorstatics
  and
  cutoffs for LJ interactions.
 
  I would like to hear your advice on which kind of machines are the
 best
  bang-for-the-buck for that kind of simulations. For instance :
 
  - Intel or AMD ? My understanding is that Intel is faster but
 expensive,
  and AMD is slower but cheaper. So at the end you almost get the same
  performance-per-buck. Right ?
 
  - Many CPUs/Cores x machine or less ? My understanding is that the
 more
  cores x machine the lesser the costs. One machine is always cheaper to
  buy

Re: [gmx-users] Cluster recommendations

2015-01-20 Thread David McGiven
Thank you very much Karsten.

2015-01-16 14:46 GMT+01:00 Carsten Kutzner ckut...@gwdg.de:

 Hi David,

 On 16 Jan 2015, at 12:28, David McGiven davidmcgiv...@gmail.com wrote:

  Hi Carsten,
 
  Thanks for your answer.
 
  2015-01-16 11:11 GMT+01:00 Carsten Kutzner ckut...@gwdg.de:
 
  Hi David,
 
  we are just finishing an evaluation to find out which is the optimal
  hardware for Gromacs setups. One of the input systems is an 80,000 atom
  membrane channel system and thus nearly exactly what you want
  to compute.
 
  The biggest benefit you will get by adding one or two consumer-class
 GPUs
  to your nodes (e.g. NVIDIA GTX 980). That will typically double your
  performace-to-price ratio. This is true for Intel as well as for AMD
  nodes, however the best ratio in our tests was observed with 10-core
  Intel CPUs (2670v2, 2680v2) in combination with a GTX 780Ti or 980,
  ideally two of those CPUs with two GPUs on a node.
 
 
  Was there a difference between 2670v2 (2.5 GHz) and 2680v2  (2.8 GHz) ?
 I'm
  wondering if those 0,3 GHz are significative. Or the 0,5 GHz compared to
  2690v2 for the matter. There’s a significative difference in price
 indeed.
 Usually the percent improvement for Gromacs performance is not as much
 as the percent improvement in clock speed, so the cheaper ones will
 give you a higher performance-to-price ratio.

 
  I'm also wondering if the performance would be better with 16 core Intels
  instead of 10 core. I.e E5-2698 v3.
 Didn’t test those.

 
  I would like to know which other tests have you done. What about AMD ?
 We tested AMD 6380 with 1-2 GTX 980 GPUs, which gives about the same
 performance-to-price ratio as a 10 core Intel 2680v2 node with one GTX 980.
 The Intel node gives you a higher per-node performance, though.

 
  Unless you want to buy expensive FDR14 Infiniband, scaling across two
  or more of those nodes won’t be good (~0.65 parallel efficiency across
 2,
  ~0.45 across 4 nodes using QDR infiniband), so I would advise against
  it and go for more sampling on single nodes.
 
 
  Well, that puzzles me. Why is it that you get poor performance ? Are you
  talking about pure CPU jobs over infiniband, or are you talking about
  CPU+GPU jobs over infiniband ?
 For a given network (e.g. QDR Infiniband), the scaling is better the lower
 the performance of the individual nodes. So for CPU-only nodes you
 will get a better scaling than for CPU+GPU nodes, which have a way higher
 per-node performance.

  How come you won’t get good performance if a great percentage of
 The performance is good, it is just that the parallel efficiency is
 not optimal for an MD system 100,000 atoms, meaning you do not get two
 times the performance on two nodes in parallel as compared to the
 aggregated performance of two individual runs.
 Bigger systems will have a better parallel efficiency.

  supercomputer centers in the world use InfiniBand ? And I'm sure lots of
  users here in the list use gromacs over Infiniband.
 I do, too :)
 But you get more trajectory for your money if you can wait and run on
 a single node.

 Carsten

 
  Thanks again.
 
  Best Regards,
  D
 
 
  Best,
   Carsten
 
 
 
 
  On 15 Jan 2015, at 17:35, David McGiven davidmcgiv...@gmail.com
 wrote:
 
  Dear Gromacs Users,
 
  We’ve got some funding to build a new cluster. It’s going to be used
  mainly
  for gromacs simulations (80% of the time). We run molecular dynamics
  simulations of transmembrane proteins inside a POPC lipid bilayer. In a
  typical system we have ~10 atoms, from which almost 1/3 correspond
 to
  water molecules. We employ usual conditions with PME for electorstatics
  and
  cutoffs for LJ interactions.
 
  I would like to hear your advice on which kind of machines are the best
  bang-for-the-buck for that kind of simulations. For instance :
 
  - Intel or AMD ? My understanding is that Intel is faster but
 expensive,
  and AMD is slower but cheaper. So at the end you almost get the same
  performance-per-buck. Right ?
 
  - Many CPUs/Cores x machine or less ? My understanding is that the more
  cores x machine the lesser the costs. One machine is always cheaper to
  buy
  and maintain than various. Plus maybe you can save the costs of
  Infiniband
  if you use large core densities ?
 
  - Should we invest in an Infiniband network to run jobs across multiple
  nodes ? Will the kind of simulations we run benefit from multiple
 nodes ?
 
  - Would we benefit from adding GPU’s to the cluster ? If so, which
 ones ?
 
  We now have a cluster with 48 and 64 AMD Opteron cores x machine (4
  processors x machine) and we run our gromacs simulations there. We
 don’t
  use MPI because our jobs are mostly run in a single node. As I said,
 with
  48 or 64 cores x simulation in a single machine. So far, we’re quite
  satisfied with the performance we get.
 
  Any advice will be greatly appreciated.
 
 
  Best Regards,
  D.
  --
  Gromacs Users mailing list
 
  * Please search the archive at
  http

Re: [gmx-users] Cluster recommendations

2015-01-16 Thread David McGiven
Hi Carsten,

Thanks for your answer.

2015-01-16 11:11 GMT+01:00 Carsten Kutzner ckut...@gwdg.de:

 Hi David,

 we are just finishing an evaluation to find out which is the optimal
 hardware for Gromacs setups. One of the input systems is an 80,000 atom
 membrane channel system and thus nearly exactly what you want
 to compute.

 The biggest benefit you will get by adding one or two consumer-class GPUs
 to your nodes (e.g. NVIDIA GTX 980). That will typically double your
 performace-to-price ratio. This is true for Intel as well as for AMD
 nodes, however the best ratio in our tests was observed with 10-core
 Intel CPUs (2670v2, 2680v2) in combination with a GTX 780Ti or 980,
 ideally two of those CPUs with two GPUs on a node.


Was there a difference between 2670v2 (2.5 GHz) and 2680v2  (2.8 GHz) ? I'm
wondering if those 0,3 GHz are significative. Or the 0,5 GHz compared to
2690v2 for the matter. There's a significative difference in price indeed.

I'm also wondering if the performance would be better with 16 core Intels
instead of 10 core. I.e E5-2698 v3.

I would like to know which other tests have you done. What about AMD ?

Unless you want to buy expensive FDR14 Infiniband, scaling across two
 or more of those nodes won’t be good (~0.65 parallel efficiency across 2,
 ~0.45 across 4 nodes using QDR infiniband), so I would advise against
 it and go for more sampling on single nodes.


Well, that puzzles me. Why is it that you get poor performance ? Are you
talking about pure CPU jobs over infiniband, or are you talking about
CPU+GPU jobs over infiniband ?

How come you won't get good performance if a great percentage of
supercomputer centers in the world use InfiniBand ? And I'm sure lots of
users here in the list use gromacs over Infiniband.

Thanks again.

Best Regards,
D


 Best,
   Carsten




 On 15 Jan 2015, at 17:35, David McGiven davidmcgiv...@gmail.com wrote:

  Dear Gromacs Users,
 
  We’ve got some funding to build a new cluster. It’s going to be used
 mainly
  for gromacs simulations (80% of the time). We run molecular dynamics
  simulations of transmembrane proteins inside a POPC lipid bilayer. In a
  typical system we have ~10 atoms, from which almost 1/3 correspond to
  water molecules. We employ usual conditions with PME for electorstatics
 and
  cutoffs for LJ interactions.
 
  I would like to hear your advice on which kind of machines are the best
  bang-for-the-buck for that kind of simulations. For instance :
 
  - Intel or AMD ? My understanding is that Intel is faster but expensive,
  and AMD is slower but cheaper. So at the end you almost get the same
  performance-per-buck. Right ?
 
  - Many CPUs/Cores x machine or less ? My understanding is that the more
  cores x machine the lesser the costs. One machine is always cheaper to
 buy
  and maintain than various. Plus maybe you can save the costs of
 Infiniband
  if you use large core densities ?
 
  - Should we invest in an Infiniband network to run jobs across multiple
  nodes ? Will the kind of simulations we run benefit from multiple nodes ?
 
  - Would we benefit from adding GPU’s to the cluster ? If so, which ones ?
 
  We now have a cluster with 48 and 64 AMD Opteron cores x machine (4
  processors x machine) and we run our gromacs simulations there. We don’t
  use MPI because our jobs are mostly run in a single node. As I said, with
  48 or 64 cores x simulation in a single machine. So far, we’re quite
  satisfied with the performance we get.
 
  Any advice will be greatly appreciated.
 
 
  Best Regards,
  D.
  --
  Gromacs Users mailing list
 
  * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
 posting!
 
  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
 
  * For (un)subscribe requests visit
  https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
 send a mail to gmx-users-requ...@gromacs.org.


 --
 Dr. Carsten Kutzner
 Max Planck Institute for Biophysical Chemistry
 Theoretical and Computational Biophysics
 Am Fassberg 11, 37077 Goettingen, Germany
 Tel. +49-551-2012313, Fax: +49-551-2012302
 http://www.mpibpc.mpg.de/grubmueller/kutzner
 http://www.mpibpc.mpg.de/grubmueller/sppexa

 --
 Gromacs Users mailing list

 * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
 posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit
 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
 send a mail to gmx-users-requ...@gromacs.org.

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


[gmx-users] Cluster recommendations

2015-01-15 Thread David McGiven
Dear Gromacs Users,

We’ve got some funding to build a new cluster. It’s going to be used mainly
for gromacs simulations (80% of the time). We run molecular dynamics
simulations of transmembrane proteins inside a POPC lipid bilayer. In a
typical system we have ~10 atoms, from which almost 1/3 correspond to
water molecules. We employ usual conditions with PME for electorstatics and
cutoffs for LJ interactions.

I would like to hear your advice on which kind of machines are the best
bang-for-the-buck for that kind of simulations. For instance :

- Intel or AMD ? My understanding is that Intel is faster but expensive,
and AMD is slower but cheaper. So at the end you almost get the same
performance-per-buck. Right ?

- Many CPUs/Cores x machine or less ? My understanding is that the more
cores x machine the lesser the costs. One machine is always cheaper to buy
and maintain than various. Plus maybe you can save the costs of Infiniband
if you use large core densities ?

- Should we invest in an Infiniband network to run jobs across multiple
nodes ? Will the kind of simulations we run benefit from multiple nodes ?

- Would we benefit from adding GPU’s to the cluster ? If so, which ones ?

We now have a cluster with 48 and 64 AMD Opteron cores x machine (4
processors x machine) and we run our gromacs simulations there. We don’t
use MPI because our jobs are mostly run in a single node. As I said, with
48 or 64 cores x simulation in a single machine. So far, we’re quite
satisfied with the performance we get.

Any advice will be greatly appreciated.


Best Regards,
D.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What wentwrong ?

2014-09-09 Thread David McGiven
Thank you very much to all of you. That should explain the difference in
performance.

I'll also discuss it with a more gromacs-knowledgeable colleague of mine.

Best Regards.

2014-09-06 8:58 GMT+02:00 Abhi Acharya abhi117acha...@gmail.com:

 Thank you Mark and Szilard for your replies. It gave more clarity on how
 the new gromacs works,
 especially in greater support for streamed computing.

 I hope David's problem is sorted too. :)

 Thanks again,

 Regards,
 Abhishek Acharya


 On Fri, Sep 5, 2014 at 10:45 PM, Szilárd Páll pall.szil...@gmail.com
 wrote:

  On Fri, Sep 5, 2014 at 6:40 PM, Abhishek Acharya
  abhi117acha...@gmail.com wrote:
   Dear Mark,
  
   Thank you for the insightful reply.
   In the manual for gromacs 5.0 it was mentioned that verlet scheme is
  better for GPU systems.
 
  More correctly, only the Verlet scheme supports GPU acceleration. The
  algorithms used by the group scheme are not appropriate for GPUs or
  other wide-SIMD accelerators.
 
   Does that mean that we should give up on the group scheme scheme, even
  though we get good performance compared to verlet?
 
  That's up to you to decide. The algorithms are different, the group
  scheme does not use a buffer by default, while the verlet scheme does
  and aims to control the drift (and keep it quite low by default).
 
   Future plan of removing group cut-off scheme indicates that it must
 have
  been associated with a high cost-benefit ratio.
 
  What makes you conclude that? The reasons are described here:
  http://www.gromacs.org/Documentation/Cut-off_schemes
 
  In very brief summary: i) the groups scheme is not suitable for
  accelerators and wide SIMD architectures ii)  energy conservation =
  high performance penalty iii) inconvenient for high parallalelization
  as it increases load imbalance
 
  Cheers,
  --
  Szilárd
 
   Could you please shed little light on this  ?
   Thanks.
  
   Regards,
   Abhishek
  
   -Original Message-
   From: Mark Abraham mark.j.abra...@gmail.com
   Sent: ‎9/‎5/‎2014 7:57 PM
   To: Discussion list for GROMACS users gmx-us...@gromacs.org
   Subject: Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5.
 What
  wentwrong ?
  
   This cutoff-scheme difference is probably caused by using an .mdp file
  that
   does not specify the cutoff scheme, and the default changed in 5.0.
  grompp
   issued a note about this, if you go and check it. The change in the
 -npme
   choice is a direct consequence of this; the heuristics underlying the
   splitting choice approximately understand the relative performance
   characteristics of the two implementations, and you can see that in
   practice the reported PP/PME balance is decent in each case.
  
   There is indeed a large chunk of water (which you can see in
 group-scheme
   log files e.g. the line in the FLOP accounting that says NB VdW  Elec.
   [W3-W3,F] dominates the cost), and David's neighbour list is
 unbuffered.
   This is indeed the regime where the group scheme might still
 out-perform
   the Verlet scheme (depending whether you value buffering in the
 neighbour
   list, which you generally should!).
  
   Mark
  
  
   On Fri, Sep 5, 2014 at 4:06 PM, Abhi Acharya abhi117acha...@gmail.com
 
   wrote:
  
   Hello,
   Is you system solvated with water molecules?
  
   The reason I ask is that, in case of the run with 4.6.5 you gromacs
 has
   used a group cut-off scheme, whereas 5.0 has used verlet scheme. For
  system
   with water molecules, group scheme gives better performance than
 verlet.
  
   For more check out:
   http://www.gromacs.org/Documentation/Cut-off_schemes
  
   Regards,
   Abhishek Acharya
  
   On Fri, Sep 5, 2014 at 7:28 PM, Carsten Kutzner ckut...@gwdg.de
  wrote:
  
Hi,
   
you might want to use g_tune_pme to find out the optimal number
of PME nodes for 4.6 and 5.0.
   
Carsten
   
   
   
On 05 Sep 2014, at 15:39, David McGiven davidmcgiv...@gmail.com
  wrote:
   
 What is even more strange is that I tried with 10 pme nodes (mdrun
   -ntmpi
 48 -v -c TEST_md.gro -npme 16), got a 15,8% performance loss and
  ns/day
are
 very similar : 33 ns/day

 D.

 2014-09-05 14:54 GMT+02:00 David McGiven davidmcgiv...@gmail.com
 :

 Hi Abhi,

 Yes I noticed that imbalance but I thought gromacs knew better
 than
   the
 user how to split PP/PME!!

 How is it possible that 4.6.5 guesses better than 5.0 ?

 Anyway, I tried :
 mdrun -nt 48 -v -c test.out

 Exits with an error You need to explicitly specify the number of
  MPI
 threads (-ntmpi) when using separate PME ranks

 Then:
 mdrun -ntmpi 48 -v -c TEST_md.gro -npme 12

 Then again 35 ns/day with the warning :
 NOTE: 8.5 % performance was lost because the PME ranks
  had less work to do than the PP ranks.
  You might want to decrease the number of PME ranks
  or decrease the cut-off and the grid spacing

[gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What went wrong ?

2014-09-05 Thread David McGiven
Dear Gromacs users,

I just compiled gromacs 5.0 with the same compiler (gcc 4.7.2), same OS
(RHEL 6) same configuration options and basically everything than my
previous gromacs 4.6.5 compilation and when doing one of our typical
simulations, I get worst performance.

4.6.5 does 45 ns/day
5.0 does 35 ns/day

Do you have any idea of what could be happening ?

Thanks.

Best Regards,
D.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What went wrong ?

2014-09-05 Thread David McGiven
Command line in both cases is :
1st : grompp -f grompp.mdp -c conf.gro -n index.ndx
2nd :mdrun -nt 48 -v -c test.out

Log file you mean the standard output/error ? Attached to the email ?

Thanks

2014-09-05 12:30 GMT+02:00 Szilárd Páll pall.szil...@gmail.com:

 Please post the command lines you used to invoke mdrun as well as the
 log files of the runs you are comparing.

 Cheers,
 --
 Szilárd


 On Fri, Sep 5, 2014 at 12:10 PM, David McGiven davidmcgiv...@gmail.com
 wrote:
  Dear Gromacs users,
 
  I just compiled gromacs 5.0 with the same compiler (gcc 4.7.2), same OS
  (RHEL 6) same configuration options and basically everything than my
  previous gromacs 4.6.5 compilation and when doing one of our typical
  simulations, I get worst performance.
 
  4.6.5 does 45 ns/day
  5.0 does 35 ns/day
 
  Do you have any idea of what could be happening ?
 
  Thanks.
 
  Best Regards,
  D.
  --
  Gromacs Users mailing list
 
  * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
 posting!
 
  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
 
  * For (un)subscribe requests visit
  https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
 send a mail to gmx-users-requ...@gromacs.org.
 --
 Gromacs Users mailing list

 * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
 posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit
 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
 send a mail to gmx-users-requ...@gromacs.org.

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What went wrong ?

2014-09-05 Thread David McGiven
Thanks Szilard, here it goes! :

4.6.5 : http://pastebin.com/nqBn3FKs
5.0 : http://pastebin.com/kR4ntHtK

2014-09-05 12:47 GMT+02:00 Szilárd Páll pall.szil...@gmail.com:

 mdrun writes a log file, named md.log by default, which contains among
 other things results of hardware detection and performance
 measurements. The list does not accept attachments, please upload it
 somewhere (dropbox, pastebin, etc.) and post a link.

 Cheers,
 --
 Szilárd


 On Fri, Sep 5, 2014 at 12:37 PM, David McGiven davidmcgiv...@gmail.com
 wrote:
  Command line in both cases is :
  1st : grompp -f grompp.mdp -c conf.gro -n index.ndx
  2nd :mdrun -nt 48 -v -c test.out
 
  Log file you mean the standard output/error ? Attached to the email ?
 
  Thanks
 
  2014-09-05 12:30 GMT+02:00 Szilárd Páll pall.szil...@gmail.com:
 
  Please post the command lines you used to invoke mdrun as well as the
  log files of the runs you are comparing.
 
  Cheers,
  --
  Szilárd
 
 
  On Fri, Sep 5, 2014 at 12:10 PM, David McGiven davidmcgiv...@gmail.com
 
  wrote:
   Dear Gromacs users,
  
   I just compiled gromacs 5.0 with the same compiler (gcc 4.7.2), same
 OS
   (RHEL 6) same configuration options and basically everything than my
   previous gromacs 4.6.5 compilation and when doing one of our typical
   simulations, I get worst performance.
  
   4.6.5 does 45 ns/day
   5.0 does 35 ns/day
  
   Do you have any idea of what could be happening ?
  
   Thanks.
  
   Best Regards,
   D.
   --
   Gromacs Users mailing list
  
   * Please search the archive at
  http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
  posting!
  
   * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
  
   * For (un)subscribe requests visit
   https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
  send a mail to gmx-users-requ...@gromacs.org.
  --
  Gromacs Users mailing list
 
  * Please search the archive at
  http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
  posting!
 
  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
 
  * For (un)subscribe requests visit
  https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
  send a mail to gmx-users-requ...@gromacs.org.
 
  --
  Gromacs Users mailing list
 
  * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
 posting!
 
  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
 
  * For (un)subscribe requests visit
  https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
 send a mail to gmx-users-requ...@gromacs.org.
 --
 Gromacs Users mailing list

 * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
 posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit
 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
 send a mail to gmx-users-requ...@gromacs.org.

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What went wrong ?

2014-09-05 Thread David McGiven
Hi Abhi,

Yes I noticed that imbalance but I thought gromacs knew better than the
user how to split PP/PME!!

How is it possible that 4.6.5 guesses better than 5.0 ?

Anyway, I tried :
mdrun -nt 48 -v -c test.out

Exits with an error You need to explicitly specify the number of MPI
threads (-ntmpi) when using separate PME ranks

Then:
mdrun -ntmpi 48 -v -c TEST_md.gro -npme 12

Then again 35 ns/day with the warning :
NOTE: 8.5 % performance was lost because the PME ranks
  had less work to do than the PP ranks.
  You might want to decrease the number of PME ranks
  or decrease the cut-off and the grid spacing.


I don't know much about Gromacs so I am puzzled.




2014-09-05 14:32 GMT+02:00 Abhi Acharya abhi117acha...@gmail.com:

 Hello,

 From the log files it is clear that out of 48 cores, the 5.0 run had 8
 cores allocated to PME while the 4.6.5 run had 12 cores. This seems to have
 caused a greater load imbalance in case of the 5.0 run.

  If you notice the last table in both .mdp files, you will notice that the
 PME spread/gather wall time values for 5.0 is more than double the wall
 time value in case of the 4.6.5.

 Try running the simulation by explicitly setting the -npme flag as 12.

 Regards,
 Abhishek Acharya


 On Fri, Sep 5, 2014 at 4:43 PM, David McGiven davidmcgiv...@gmail.com
 wrote:

  Thanks Szilard, here it goes! :
 
  4.6.5 : http://pastebin.com/nqBn3FKs
  5.0 : http://pastebin.com/kR4ntHtK
 
  2014-09-05 12:47 GMT+02:00 Szilárd Páll pall.szil...@gmail.com:
 
   mdrun writes a log file, named md.log by default, which contains among
   other things results of hardware detection and performance
   measurements. The list does not accept attachments, please upload it
   somewhere (dropbox, pastebin, etc.) and post a link.
  
   Cheers,
   --
   Szilárd
  
  
   On Fri, Sep 5, 2014 at 12:37 PM, David McGiven 
 davidmcgiv...@gmail.com
   wrote:
Command line in both cases is :
1st : grompp -f grompp.mdp -c conf.gro -n index.ndx
2nd :mdrun -nt 48 -v -c test.out
   
Log file you mean the standard output/error ? Attached to the email ?
   
Thanks
   
2014-09-05 12:30 GMT+02:00 Szilárd Páll pall.szil...@gmail.com:
   
Please post the command lines you used to invoke mdrun as well as
 the
log files of the runs you are comparing.
   
Cheers,
--
Szilárd
   
   
On Fri, Sep 5, 2014 at 12:10 PM, David McGiven 
  davidmcgiv...@gmail.com
   
wrote:
 Dear Gromacs users,

 I just compiled gromacs 5.0 with the same compiler (gcc 4.7.2),
 same
   OS
 (RHEL 6) same configuration options and basically everything than
 my
 previous gromacs 4.6.5 compilation and when doing one of our
 typical
 simulations, I get worst performance.

 4.6.5 does 45 ns/day
 5.0 does 35 ns/day

 Do you have any idea of what could be happening ?

 Thanks.

 Best Regards,
 D.
 --
 Gromacs Users mailing list

 * Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit

 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
  or
send a mail to gmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list
   
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!
   
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
   
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
 or
send a mail to gmx-users-requ...@gromacs.org.
   
--
Gromacs Users mailing list
   
* Please search the archive at
   http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
   posting!
   
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
   
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
 or
   send a mail to gmx-users-requ...@gromacs.org.
   --
   Gromacs Users mailing list
  
   * Please search the archive at
   http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
   posting!
  
   * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
  
   * For (un)subscribe requests visit
   https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
   send a mail to gmx-users-requ...@gromacs.org.
  
  --
  Gromacs Users mailing list
 
  * Please search the archive at
  http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
  posting!
 
  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
 
  * For (un)subscribe requests visit
  https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
  send a mail to gmx-users-requ...@gromacs.org.
 
 --
 Gromacs Users mailing list

 * Please search the archive at
 http

Re: [gmx-users] Gromacs 5.0 compilation slower than 4.6.5. What went wrong ?

2014-09-05 Thread David McGiven
What is even more strange is that I tried with 10 pme nodes (mdrun -ntmpi
48 -v -c TEST_md.gro -npme 16), got a 15,8% performance loss and ns/day are
very similar : 33 ns/day

D.

2014-09-05 14:54 GMT+02:00 David McGiven davidmcgiv...@gmail.com:

 Hi Abhi,

 Yes I noticed that imbalance but I thought gromacs knew better than the
 user how to split PP/PME!!

 How is it possible that 4.6.5 guesses better than 5.0 ?

 Anyway, I tried :
 mdrun -nt 48 -v -c test.out

 Exits with an error You need to explicitly specify the number of MPI
 threads (-ntmpi) when using separate PME ranks

 Then:
 mdrun -ntmpi 48 -v -c TEST_md.gro -npme 12

 Then again 35 ns/day with the warning :
 NOTE: 8.5 % performance was lost because the PME ranks
   had less work to do than the PP ranks.
   You might want to decrease the number of PME ranks
   or decrease the cut-off and the grid spacing.


 I don't know much about Gromacs so I am puzzled.




 2014-09-05 14:32 GMT+02:00 Abhi Acharya abhi117acha...@gmail.com:

 Hello,

 From the log files it is clear that out of 48 cores, the 5.0 run had 8
 cores allocated to PME while the 4.6.5 run had 12 cores. This seems to
 have
 caused a greater load imbalance in case of the 5.0 run.

  If you notice the last table in both .mdp files, you will notice that the
 PME spread/gather wall time values for 5.0 is more than double the wall
 time value in case of the 4.6.5.

 Try running the simulation by explicitly setting the -npme flag as 12.

 Regards,
 Abhishek Acharya


 On Fri, Sep 5, 2014 at 4:43 PM, David McGiven davidmcgiv...@gmail.com
 wrote:

  Thanks Szilard, here it goes! :
 
  4.6.5 : http://pastebin.com/nqBn3FKs
  5.0 : http://pastebin.com/kR4ntHtK
 
  2014-09-05 12:47 GMT+02:00 Szilárd Páll pall.szil...@gmail.com:
 
   mdrun writes a log file, named md.log by default, which contains among
   other things results of hardware detection and performance
   measurements. The list does not accept attachments, please upload it
   somewhere (dropbox, pastebin, etc.) and post a link.
  
   Cheers,
   --
   Szilárd
  
  
   On Fri, Sep 5, 2014 at 12:37 PM, David McGiven 
 davidmcgiv...@gmail.com
   wrote:
Command line in both cases is :
1st : grompp -f grompp.mdp -c conf.gro -n index.ndx
2nd :mdrun -nt 48 -v -c test.out
   
Log file you mean the standard output/error ? Attached to the email
 ?
   
Thanks
   
2014-09-05 12:30 GMT+02:00 Szilárd Páll pall.szil...@gmail.com:
   
Please post the command lines you used to invoke mdrun as well as
 the
log files of the runs you are comparing.
   
Cheers,
--
Szilárd
   
   
On Fri, Sep 5, 2014 at 12:10 PM, David McGiven 
  davidmcgiv...@gmail.com
   
wrote:
 Dear Gromacs users,

 I just compiled gromacs 5.0 with the same compiler (gcc 4.7.2),
 same
   OS
 (RHEL 6) same configuration options and basically everything
 than my
 previous gromacs 4.6.5 compilation and when doing one of our
 typical
 simulations, I get worst performance.

 4.6.5 does 45 ns/day
 5.0 does 35 ns/day

 Do you have any idea of what could be happening ?

 Thanks.

 Best Regards,
 D.
 --
 Gromacs Users mailing list

 * Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit

 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
  or
send a mail to gmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list
   
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!
   
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
   
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
 or
send a mail to gmx-users-requ...@gromacs.org.
   
--
Gromacs Users mailing list
   
* Please search the archive at
   http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
   posting!
   
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
   
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
 or
   send a mail to gmx-users-requ...@gromacs.org.
   --
   Gromacs Users mailing list
  
   * Please search the archive at
   http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
   posting!
  
   * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
  
   * For (un)subscribe requests visit
   https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
   send a mail to gmx-users-requ...@gromacs.org.
  
  --
  Gromacs Users mailing list
 
  * Please search the archive at
  http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
  posting!
 
  * Can't post? Read http