Re: [gmx-users] AVX2 SIMD intrinsics speed boost

2013-07-12 Thread Erik Lindahl
Hi,

We will have AVX2 acceleration ready for general usage before the end of July 
(together with some other goodies), and it will be markedly faster, but until 
it's ready and tested to give correct results we can't say anything about the 
final performance.

However, in general AVX2 will have the largest effects on the same parts of the 
code that run on the GPU when one is present, which means it might not provide 
a huge speedup when used in combination with accelerators.

Cheers,

E.

On Jul 11, 2013, at 6:01 PM, Bin Liu fdusuperstr...@gmail.com wrote:

 Hi all,
 
 If my understanding is correct, GROMACS parallelization and acceleration
 page indicates AVX2 SIMD intrinsics can offer a speed boost on a Haswell
 CPU. I was wondering how much performance gain we can expect from it. In
 another word, what's the approximate speed increase if we run a simulation
 with AVX2 SIMD intrinsics on a Haswell CPU (say i7 4770K) than on an Ivy
 Bridge CPU of the same  clock (say i7 3770K) with the current AVX SIMD
 intrinsics? And is there a timeline for the release of AVX2 SIMD intrinsics?
 
 This information is crucial if we want to assemble a machine with balanced
 CPU and GPU performance.  My current machine has i7 3770K (3.5GHz, stock
 frequency) and Geforce 650 Ti (768 CUDA cores, 1032MHz). When I ran
 simulations with   rcoulomb=1.0 and rvdw=1.0, I got this at the end of the
 log file:
 
 *Force evaluation time GPU/CPU: 1.762 ms/1.150 ms = 1.531*
 *
 *
 It seems I need a GPU with 50% more CUDA cores. In the best scenario, If
 AVX2 can give 30% speed boost, and I can successfully overclock 4770K to
 4.5GHz, I need 1900 CUDA cores( 130%*(4.5GHz/3.5GHz)*1.531*768 cores) at
 the same frequency to get balanced CPU and GPU performance. Then I will
 need a GeForce GTX 780 (2304 CUDA cores at 863MHz, equivalent to 1925 CUDA
 cores at 1032MHz). Since GROMACS is highly insensitive to memory clock and
 latency, I hope this naive arithmetic can give a good estimation which
 graphic card I should purchase.
 
 Best
 
 Bin
 -- 
 gmx-users mailing listgmx-users@gromacs.org
 http://lists.gromacs.org/mailman/listinfo/gmx-users
 * Please search the archive at 
 http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
 * Please don't post (un)subscribe requests to the list. Use the 
 www interface or send it to gmx-users-requ...@gromacs.org.
 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] AVX2 SIMD intrinsics speed boost

2013-07-11 Thread Bin Liu
Hi all,

If my understanding is correct, GROMACS parallelization and acceleration
page indicates AVX2 SIMD intrinsics can offer a speed boost on a Haswell
CPU. I was wondering how much performance gain we can expect from it. In
another word, what's the approximate speed increase if we run a simulation
with AVX2 SIMD intrinsics on a Haswell CPU (say i7 4770K) than on an Ivy
Bridge CPU of the same  clock (say i7 3770K) with the current AVX SIMD
intrinsics? And is there a timeline for the release of AVX2 SIMD intrinsics?

This information is crucial if we want to assemble a machine with balanced
CPU and GPU performance.  My current machine has i7 3770K (3.5GHz, stock
frequency) and Geforce 650 Ti (768 CUDA cores, 1032MHz). When I ran
simulations with   rcoulomb=1.0 and rvdw=1.0, I got this at the end of the
log file:

*Force evaluation time GPU/CPU: 1.762 ms/1.150 ms = 1.531*
*
*
It seems I need a GPU with 50% more CUDA cores. In the best scenario, If
AVX2 can give 30% speed boost, and I can successfully overclock 4770K to
4.5GHz, I need 1900 CUDA cores( 130%*(4.5GHz/3.5GHz)*1.531*768 cores) at
the same frequency to get balanced CPU and GPU performance. Then I will
need a GeForce GTX 780 (2304 CUDA cores at 863MHz, equivalent to 1925 CUDA
cores at 1032MHz). Since GROMACS is highly insensitive to memory clock and
latency, I hope this naive arithmetic can give a good estimation which
graphic card I should purchase.

Best

Bin
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists