The paper looks good. Do some more work and publish many
Sent from my iPhone
On 17-Jan-2013, at 8:18 PM, James Starlight jmsstarli...@gmail.com wrote:
Dear Gromacs Developers!
Using sd1 integrator I've obtain good performance with the core-5 +
GTX 670 ( 13ns\per day) for the system of 60k atoms. That results on
30% better than with the sd integrator.
Buit on my another work-station which differs only by slower GPU ( GT
640). I've obtained some gpu\cpu mis-match.
Force evaluation time GPU/CPU: 6.835 ms/2.026 ms = 3.373( # on
the first station with GTX 670 I ve obtained GPU/CPU: ratio close to
1.
At both cases I'm using the same simulation parameters with 0,8
cutoffs (it's also important that in the second case I've calculated
another system consisted of 33k atoms by means of umbrella sampling
pulling)). Could you tell me how I could increase performance on my
second station ( to reduce gpucpu ratio) ? I've attached log for that
simulation here http://www.sendspace.com/file/x0e3z8
James
2013/1/17 Szilárd Páll szilard.p...@cbr.su.se:
Hi,
Just to note for the users who might read this: the report is valid, some
non-thread-parallel code is the reason and we hope to have a fix for 4.6.0.
For updates, follow the issue #1211.
Cheers,
--
Szilárd
On Wed, Jan 16, 2013 at 4:45 PM, Berk Hess g...@hotmail.com wrote:
The issue I'm referring to is about a factor of 2 in update and
constraints, but here it's much more.
I just found out that the SD update is not OpenMP threaded (and I even
noted in the code why this is).
I reopened the issue and will find a solution.
Cheers.
Berk
Date: Wed, 16 Jan 2013 16:20:32 +0100
Subject: Re: [gmx-users] 60% slowdown with GPU / verlet and sd
integrator
From: mark.j.abra...@gmail.com
To: gmx-users@gromacs.org
We should probably note this effect on the wiki somewhere?
Mark
On Wed, Jan 16, 2013 at 3:44 PM, Berk Hess g...@hotmail.com wrote:
Hi,
Unfortunately this is not a bug, but a feature!
We made the non-bondeds so fast on the GPU that integration and
constraints take more time.
The sd1 integrator is almost as fast as the md integrator, but slightly
less accurate.
In most cases that's a good solution.
I closed the redmine issue:
http://redmine.gromacs.org/issues/1121
Cheers,
Berk
Date: Wed, 16 Jan 2013 17:26:18 +0300
Subject: Re: [gmx-users] 60% slowdown with GPU / verlet and sd
integrator
From: jmsstarli...@gmail.com
To: gmx-users@gromacs.org
Hi all!
I've also done some calculations with the SD integraator used as the
thermostat ( without t_coupl ) with the system of 65k atoms I
obtained
10ns\day performance on gtc 670 and 4th core i5.
I haventrun any simulations with MD integrator yet so It should test
it.
James
2013/1/15 Szilárd Páll szilard.p...@cbr.su.se:
Hi Floris,
Great feedback, this needs to be looked into. Could you please
file a
bug
report, preferably with a tpr (and/or all inputs) as well as log
files.
Thanks,
--
Szilárd
On Tue, Jan 15, 2013 at 3:50 AM, Floris Buelens
floris_buel...@yahoo.comwrote:
Hi,
I'm seeing MD simulation running a lot slower with the sd
integrator
than
with md - ca. 10 vs. 30 ns/day for my 47000 atom system. I found
no
documented indication that this should be the case.
Timings and logs pasted in below - wall time seems to be
accumulating
up
in Update and Rest, adding up to 60% of total. The effect is
still
there
without GPU, ca. 40% slowdown when switching from group to Verlet
with the
SD integrator
System: Xeon E5-1620, 1x GTX 680, gromacs
4.6-beta3-dev-20130107-e66851a-unknown, GCC 4.4.6 and 4.7.0
I didn't file a bug report yet as I don't have much variety of
testing
conditions available right now, I hope someone else has a moment
to
try to
reproduce?
Timings:
cpu (ns/day)
sd / verlet: 6
sd / group: 10
md / verlet: 9.2
md / group: 11.4
gpu (ns/day)
sd / verlet: 11
md / verlet: 29.8
**MD integrator, GPU / verlet
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet
kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
VF=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-
Pair Search distance check 1244.988096 11204.893 0.1
NxN QSTab Elec. + VdW [F] 194846.615488 7988711.235 91.9
NxN QSTab Elec. + VdW [VF] 2009.923008 118585.457 1.4
1,4 nonbonded interactions 31.616322 2845.469 0.0
Calc Weights 703.010574 25308.381 0.3
Spread Q Bspline 14997.558912 29995.118 0.3
Gather F Bspline 14997.558912 89985.353 1.0
3D-FFT 47658.567884 381268.543 4.4
Solve PME 20.580896 1317.177 0.0
Shift-X 9.418458 56.511 0.0
Angles 21.879375 3675.735