Re: [gmx-users] Performance of beowulf cluster

2014-08-11 Thread Szilárd Páll
On Tue, Aug 5, 2014 at 5:21 PM, Abhi Acharya abhi117acha...@gmail.com wrote:
 Thank you Mirco and Szilard,
 With regards to the GPU system, I have decided on a Xeon E5-1650 v2 system
 with GEForce GTX -780 Ti GPU for equilibration and production runs with
 small systems. But for large systems or REMD simulations, I am a bit
 skeptical on banking on GPU systems.

How would you define large? A 100k protein system (PME, rc=0.9,
vsites 5 fs) will run 50 ns/day on a box like the above, but ~5x (!)
slower on an FX 8350 without a GPU! Some numbers I had around plus the
CPU ones I got from some quick-and-dirty benchmark runs:

i7 3930K +/- K20: 52/17.5 ns/day
FX 8350 +/- GTX 580 : 31.5/10.1 ns/day

I think the above Xeon may not be the best deal, it is based on the
now outdated Sandy Bridge architecture, an i7 4930K will be around 10%
faster; depending on your timeline the Haswell 5930K (released this
fall) will be *far* better than either.

Additionally, unless the AMD CPUs are very cheap, my guess is that
you'll better performance per buck (and per W too) with mid-range
Haswells like i5 4670/4690.

 Any pointers as to what would be the
 minimum configuration required for REMD simulations on say a 50 K atom
 protein sampled for 100 different temperatures? I am open to all possible
 options in this regard (obviously a little cost effectiveness does not harm
 ).

For a 100-way multi-run you'll need at least 100 cores and even with
fast ones you won't get too good performance - especially without
GPUs. In fact, if you are planning to do REMD runs, you can make great
use of GPUs! The aggregate performance of independent runs sharing a
GPU (but not CPU cores) can be much greater than what you can achieve
with a single run on the same GPU-CPU pair; for an example, see the
second plot on this poster: http://goo.gl/2xH52y

Hardware-wise, with mid-range desktop Haswell CPUs, I guess you can
get about 25 ns/day and ~75 ns/day if you add a (fast enough) GPU; you
can bump this by another ~20% (aggregate) if you run 2-4 independent
runs per node. NOTE: I can't vouch for any of these numbers, they're
guesstimates.

 Also, would investing in a *good* 40 Gigabit ethernet network ensure good
 performance if we later plan to more nodes to the cluster.

As I wrote before, I personally don't have experience with MD over
Ethernet. Traditionally Ethernet has been always considered borderline
useless, but with the RDMA protocol iWARP over 10 and 40 GB Ethernet,
I've seen people report decent results.

 Regards,
 Abhishek


 On Tue, Aug 5, 2014 at 5:46 PM, Szilárd Páll pall.szil...@gmail.com wrote:

 Hi,

 You need fast network to parallelize across multiple nodes. 1 Gb
 ethernet won't work well and even even 10/40 Gb ethernet needs to be
 of good quality; you'd likely need to buy separate adapters, the
 on-board ones won't perform well. I posted some links to the list
 related to this a fed days ago.

 The AMD FX dekstop hardware you mention is OK, but I'm not sure that
 it's gives the best performance/price. If you find (very) discounted
 Sandy Bridge-E (i7 3930K) or the cheaper Haswells like i5 4670 may
 actually provide better prerformance for the money. Ivy Bridge-E or
 Haswell-E as Mirco suggests are the best single-socket workstation
 options, but those are/will be pretty expensive.

 Finally, unless you have a good reason not to, you should not just
 consider GPUs, but consider what CPU/platform works best with GPUs.

 Cheers,
 --
 Szilárd


 On Tue, Aug 5, 2014 at 7:01 AM, Abhishek Acharya
 abhi117acha...@gmail.com wrote:
  Hello gromacs users,
  I am planning on investing in a beowulf cluster with 6 nodes (48 cores)
 each with AMD Fx 8350 processor, 8 GB memory  connected by 1 Gigabit
 Ethernet switch. Although I plan to add more cores to this cluster later
 on, what is the max performance expected from the current specs for a
 100,000 atom simulation box ? Also, is it better to invest in a  single 48
 core server ? The cluster system can be set up at almost half the price of
 a 48 core server, but do we lose out on performance in the process??
 
  Regards,
 
  Abhishek Acharya
  --
  Gromacs Users mailing list
 
  * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
 posting!
 
  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
 
  * For (un)subscribe requests visit
  https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
 send a mail to gmx-users-requ...@gromacs.org.
 --
 Gromacs Users mailing list

 * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
 posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit
 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
 send a mail to gmx-users-requ...@gromacs.org.




 --
 Abhishek Acharya
 Senior Research Fellow
 Gene Regulation Laboratory
 National Institute of Immunology
 --
 Gromacs 

Re: [gmx-users] Performance of beowulf cluster

2014-08-11 Thread Abhi Acharya
Thank you Dr. Szilard.

This was really helpful. Incidentally, we eventually decided on a i7-4930K,
so we got that right ;).

As advised, we have now junked the idea of an ethernet cluster. We will be
testing the GPU systems first, then we will decide on furthur course of
action.

Thanks again.

Regards,
Abhishek



On Tue, Aug 12, 2014 at 6:00 AM, Szilárd Páll pall.szil...@gmail.com
wrote:

 On Tue, Aug 5, 2014 at 5:21 PM, Abhi Acharya abhi117acha...@gmail.com
 wrote:
  Thank you Mirco and Szilard,
  With regards to the GPU system, I have decided on a Xeon E5-1650 v2
 system
  with GEForce GTX -780 Ti GPU for equilibration and production runs with
  small systems. But for large systems or REMD simulations, I am a bit
  skeptical on banking on GPU systems.

 How would you define large? A 100k protein system (PME, rc=0.9,
 vsites 5 fs) will run 50 ns/day on a box like the above, but ~5x (!)
 slower on an FX 8350 without a GPU! Some numbers I had around plus the
 CPU ones I got from some quick-and-dirty benchmark runs:

 i7 3930K +/- K20: 52/17.5 ns/day
 FX 8350 +/- GTX 580 : 31.5/10.1 ns/day

 I think the above Xeon may not be the best deal, it is based on the
 now outdated Sandy Bridge architecture, an i7 4930K will be around 10%
 faster; depending on your timeline the Haswell 5930K (released this
 fall) will be *far* better than either.

 Additionally, unless the AMD CPUs are very cheap, my guess is that
 you'll better performance per buck (and per W too) with mid-range
 Haswells like i5 4670/4690.

  Any pointers as to what would be the
  minimum configuration required for REMD simulations on say a 50 K atom
  protein sampled for 100 different temperatures? I am open to all possible
  options in this regard (obviously a little cost effectiveness does not
 harm
  ).

 For a 100-way multi-run you'll need at least 100 cores and even with
 fast ones you won't get too good performance - especially without
 GPUs. In fact, if you are planning to do REMD runs, you can make great
 use of GPUs! The aggregate performance of independent runs sharing a
 GPU (but not CPU cores) can be much greater than what you can achieve
 with a single run on the same GPU-CPU pair; for an example, see the
 second plot on this poster: http://goo.gl/2xH52y

 Hardware-wise, with mid-range desktop Haswell CPUs, I guess you can
 get about 25 ns/day and ~75 ns/day if you add a (fast enough) GPU; you
 can bump this by another ~20% (aggregate) if you run 2-4 independent
 runs per node. NOTE: I can't vouch for any of these numbers, they're
 guesstimates.

  Also, would investing in a *good* 40 Gigabit ethernet network ensure good
  performance if we later plan to more nodes to the cluster.

 As I wrote before, I personally don't have experience with MD over
 Ethernet. Traditionally Ethernet has been always considered borderline
 useless, but with the RDMA protocol iWARP over 10 and 40 GB Ethernet,
 I've seen people report decent results.

  Regards,
  Abhishek
 
 
  On Tue, Aug 5, 2014 at 5:46 PM, Szilárd Páll pall.szil...@gmail.com
 wrote:
 
  Hi,
 
  You need fast network to parallelize across multiple nodes. 1 Gb
  ethernet won't work well and even even 10/40 Gb ethernet needs to be
  of good quality; you'd likely need to buy separate adapters, the
  on-board ones won't perform well. I posted some links to the list
  related to this a fed days ago.
 
  The AMD FX dekstop hardware you mention is OK, but I'm not sure that
  it's gives the best performance/price. If you find (very) discounted
  Sandy Bridge-E (i7 3930K) or the cheaper Haswells like i5 4670 may
  actually provide better prerformance for the money. Ivy Bridge-E or
  Haswell-E as Mirco suggests are the best single-socket workstation
  options, but those are/will be pretty expensive.
 
  Finally, unless you have a good reason not to, you should not just
  consider GPUs, but consider what CPU/platform works best with GPUs.
 
  Cheers,
  --
  Szilárd
 
 
  On Tue, Aug 5, 2014 at 7:01 AM, Abhishek Acharya
  abhi117acha...@gmail.com wrote:
   Hello gromacs users,
   I am planning on investing in a beowulf cluster with 6 nodes (48
 cores)
  each with AMD Fx 8350 processor, 8 GB memory  connected by 1 Gigabit
  Ethernet switch. Although I plan to add more cores to this cluster later
  on, what is the max performance expected from the current specs for a
  100,000 atom simulation box ? Also, is it better to invest in a  single
 48
  core server ? The cluster system can be set up at almost half the price
 of
  a 48 core server, but do we lose out on performance in the process??
  
   Regards,
  
   Abhishek Acharya
   --
   Gromacs Users mailing list
  
   * Please search the archive at
  http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
  posting!
  
   * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
  
   * For (un)subscribe requests visit
   https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
  send a mail to 

Re: [gmx-users] Performance of beowulf cluster

2014-08-05 Thread Mirco Wahab

On 05.08.2014 07:01, Abhishek Acharya wrote:

I am planning on investing in a beowulf cluster with 6 nodes (48 cores) each 
with AMD Fx 8350 processor, 8 GB memory  connected by 1 Gigabit Ethernet 
switch. Although I plan to add more cores to this cluster later on, what is the 
max performance expected from the current specs for a 100,000 atom simulation 
box ? Also, is it better to invest in a  single 48 core server ? The cluster 
system can be set up at almost half the price of a 48 core server, but do we 
lose out on performance in the process?


6 AMD-8350 boxes, connected to *one* 1GB switch?

This system could be put to very good use if you are
able to perform 6 *independent simulations* on your
molecular system.

100,000 Atoms is a rather small system for large scale
parallelization. A 100K SPC box would have an edge length
of about 10nm?

If it's important for you to have parallel runs on single
molecular systems, you could consider a dual-socket-2011
system running  6-core i7 processors (i7/4930K or upcoming
Haswell-E 5930K) combined with quad-channel DDR3/4.
This would give you a 24x parallelization on a single
workstation.

What about modern (Nvidia) consumer graphics cards? These are
supported very well by Gromacs.

Regards

M.

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Performance of beowulf cluster

2014-08-05 Thread Szilárd Páll
Hi,

You need fast network to parallelize across multiple nodes. 1 Gb
ethernet won't work well and even even 10/40 Gb ethernet needs to be
of good quality; you'd likely need to buy separate adapters, the
on-board ones won't perform well. I posted some links to the list
related to this a fed days ago.

The AMD FX dekstop hardware you mention is OK, but I'm not sure that
it's gives the best performance/price. If you find (very) discounted
Sandy Bridge-E (i7 3930K) or the cheaper Haswells like i5 4670 may
actually provide better prerformance for the money. Ivy Bridge-E or
Haswell-E as Mirco suggests are the best single-socket workstation
options, but those are/will be pretty expensive.

Finally, unless you have a good reason not to, you should not just
consider GPUs, but consider what CPU/platform works best with GPUs.

Cheers,
--
Szilárd


On Tue, Aug 5, 2014 at 7:01 AM, Abhishek Acharya
abhi117acha...@gmail.com wrote:
 Hello gromacs users,
 I am planning on investing in a beowulf cluster with 6 nodes (48 cores) each 
 with AMD Fx 8350 processor, 8 GB memory  connected by 1 Gigabit Ethernet 
 switch. Although I plan to add more cores to this cluster later on, what is 
 the max performance expected from the current specs for a 100,000 atom 
 simulation box ? Also, is it better to invest in a  single 48 core server ? 
 The cluster system can be set up at almost half the price of a 48 core 
 server, but do we lose out on performance in the process??

 Regards,

 Abhishek Acharya
 --
 Gromacs Users mailing list

 * Please search the archive at 
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit
 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
 mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Performance of beowulf cluster

2014-08-05 Thread Abhi Acharya
Thank you Mirco and Szilard,
With regards to the GPU system, I have decided on a Xeon E5-1650 v2 system
with GEForce GTX -780 Ti GPU for equilibration and production runs with
small systems. But for large systems or REMD simulations, I am a bit
skeptical on banking on GPU systems. Any pointers as to what would be the
minimum configuration required for REMD simulations on say a 50 K atom
protein sampled for 100 different temperatures? I am open to all possible
options in this regard (obviously a little cost effectiveness does not harm
).
Also, would investing in a *good* 40 Gigabit ethernet network ensure good
performance if we later plan to more nodes to the cluster.

Regards,
Abhishek


On Tue, Aug 5, 2014 at 5:46 PM, Szilárd Páll pall.szil...@gmail.com wrote:

 Hi,

 You need fast network to parallelize across multiple nodes. 1 Gb
 ethernet won't work well and even even 10/40 Gb ethernet needs to be
 of good quality; you'd likely need to buy separate adapters, the
 on-board ones won't perform well. I posted some links to the list
 related to this a fed days ago.

 The AMD FX dekstop hardware you mention is OK, but I'm not sure that
 it's gives the best performance/price. If you find (very) discounted
 Sandy Bridge-E (i7 3930K) or the cheaper Haswells like i5 4670 may
 actually provide better prerformance for the money. Ivy Bridge-E or
 Haswell-E as Mirco suggests are the best single-socket workstation
 options, but those are/will be pretty expensive.

 Finally, unless you have a good reason not to, you should not just
 consider GPUs, but consider what CPU/platform works best with GPUs.

 Cheers,
 --
 Szilárd


 On Tue, Aug 5, 2014 at 7:01 AM, Abhishek Acharya
 abhi117acha...@gmail.com wrote:
  Hello gromacs users,
  I am planning on investing in a beowulf cluster with 6 nodes (48 cores)
 each with AMD Fx 8350 processor, 8 GB memory  connected by 1 Gigabit
 Ethernet switch. Although I plan to add more cores to this cluster later
 on, what is the max performance expected from the current specs for a
 100,000 atom simulation box ? Also, is it better to invest in a  single 48
 core server ? The cluster system can be set up at almost half the price of
 a 48 core server, but do we lose out on performance in the process??
 
  Regards,
 
  Abhishek Acharya
  --
  Gromacs Users mailing list
 
  * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
 posting!
 
  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
 
  * For (un)subscribe requests visit
  https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
 send a mail to gmx-users-requ...@gromacs.org.
 --
 Gromacs Users mailing list

 * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
 posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit
 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
 send a mail to gmx-users-requ...@gromacs.org.




-- 
Abhishek Acharya
Senior Research Fellow
Gene Regulation Laboratory
National Institute of Immunology
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


[gmx-users] Performance of beowulf cluster

2014-08-04 Thread Abhishek Acharya
Hello gromacs users,
I am planning on investing in a beowulf cluster with 6 nodes (48 cores) each 
with AMD Fx 8350 processor, 8 GB memory  connected by 1 Gigabit Ethernet 
switch. Although I plan to add more cores to this cluster later on, what is the 
max performance expected from the current specs for a 100,000 atom simulation 
box ? Also, is it better to invest in a  single 48 core server ? The cluster 
system can be set up at almost half the price of a 48 core server, but do we 
lose out on performance in the process??

Regards, 

Abhishek Acharya
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.