Dear All

I am trying to run the free-energy simulation using TI method in gromacs
2019.1 in a GPU machine  (containing two Nvidia Geforce 1080 TI cards ).
But unfortunately, am unable to run the free-energy simulation run on GPU.

The normal MD simulation (without free-energy )is able to run perfectly by
making use of GPU, which gives us excellent speed up in the simulation.
for example, 100 K atoms system is able to give us ~ 80 ns per day on a gpu
card.  (It uses > 80 % GPU usage)
When I am trying to run the free-energy simulations for the same system,
the performance drastically falls down to ~0.02 ns per day.  (It uses 0 %
GPU usage).

I am pasting the MDP files for Normal MD simulation and Free-energy
simulation below.
npt. mdp (MD simulation)

title        = MD simulation
; Run parameters
integrator    = md        ; leap-frog integrator
nsteps        = 100000000      ; 2 * 60000000   = 200 ns
dt        = 0.002        ; 2 fs
; Output control
nstxout                = 100000      ; save coordinates every 10.0 ps
nstvout                = 100000      ; save velocities every 10.0 ps
nstfout            = 100000      ; save forces every 10.0 ps
nstenergy            = 500        ; save energies every 10.0 ps
nstlog                = 5000        ; update log file every 10.0 ps
nstxout-compressed      = 5000          ; save compressed coordinates every
10.0 ps, nstxout-compressed replaces nstxtcout
compressed-x-grps       = System        ; replaces xtc-grps
; Bond parameters
continuation            = yes            ; Restarting after NVT
constraint_algorithm    = lincs            ; holonomic constraints
constraints            = h-bonds        ; H bonds constrained
lincs_iter            = 1            ; accuracy of LINCS
lincs_order            = 4            ; also related to accuracy
; Neighborsearching
cutoff-scheme       = Verlet
ns_type            = grid        ; search neighboring grid cells
nstlist            = 10        ; 20 fs, largely irrelevant with Verlet
rcoulomb        = 1.2        ; short-range electrostatic cutoff (in nm)
rvdw            = 1.2        ; short-range van der Waals cutoff (in nm)
rvdw-switch         = 1.0
vdwtype         = cutoff
vdw-modifier        = force-switch
rlist             = 1.2
; Electrostatics
coulombtype        = PME        ; Particle Mesh Ewald for long-range
pme_order        = 4                ; cubic interpolation
fourierspacing        = 0.16        ; grid spacing for FFT
; Temperature coupling is on
tcoupl        = V-rescale                ; modified Berendsen thermostat
tc-grps        = system            ;     Water           ; two coupling
groups - more accurate
tau_t        = 0.1             ;    0.1          ; time constant, in ps
ref_t        = 360              ;    340             ; reference
temperature, one for each group, in K
; Pressure coupling is on
;pcoupl                  =no
pcoupl                = Parrinello-Rahman        ; Pressure coupling on in
pcoupltype            = isotropic                ; uniform scaling of box
tau_p                = 2.0                    ; time constant, in ps
ref_p                = 1.0   ;1.0                 ; reference pressure, in
compressibility         = 4.5e-5 ; 4.5e-5            ; isothermal
compressibility of water, bar^-1
; Periodic boundary conditions
pbc        = xyz        ; 3-D PBC
; Dispersion correction
DispCorr    = no        ; account for cut-off vdW scheme
; Velocity generation
gen_vel        = no        ; Velocity generation is off
npt. mdp ( for free-energy simulation)

; Run control
integrator               = sd       ; Langevin dynamics
tinit                    = 0
dt                       = 0.002
nsteps                   = 50000    ; 100 ps
nstcomm                  = 100
; Output control
nstxout                  = 500
nstvout                  = 500
nstfout                  = 0
nstlog                   = 500
nstenergy                = 500
nstxout-compressed       = 0
; Neighborsearching and short-range nonbonded interactions
cutoff-scheme            = verlet
nstlist                  = 20
ns_type                  = grid
pbc                      = xyz
rlist                    = 1.2
; Electrostatics
coulombtype              = PME
rcoulomb                 = 1.2
; van der Waals
vdwtype                  = cutoff
vdw-modifier             = potential-switch
rvdw-switch              = 1.0
rvdw                     = 1.2
; Apply long range dispersion corrections for Energy and Pressure
DispCorr                  = EnerPres
; Spacing for the PME/PPPM FFT grid
fourierspacing           = 0.12
; EWALD/PME/PPPM parameters
pme_order                = 6
ewald_rtol               = 1e-06
epsilon_surface          = 0
; Temperature coupling
; tcoupl is implicitly handled by the sd integrator
tc_grps                  = system
tau_t                    = 1.0
ref_t                    = 298
; Pressure coupling is on for NPT
Pcoupl                   = berendsen
tau_p                    = 1.0
compressibility          = 4.5e-05
ref_p                    = 1.0
; Free energy control stuff
free_energy              = yes
init_lambda_state        = 0
delta_lambda             = 0
calc_lambda_neighbors    = 1        ; only immediate neighboring windows
couple-moltype           = IO  ; name of moleculetype to decouple
couple-lambda0           = vdw     ; only van der Waals interactions
couple-lambda1           = vdw-q     ; turn off everything, in this case
only vdW
couple-intramol          = no
; Vectors of lambda specified here
; Each combination is an index that is retrieved from init_lambda_state for
each simulation
; init_lambda_state        0    1    2    3    4    5    6    7    8
9    10   11   12   13   14   15   16   17   18   19   20
vdw_lambdas              = 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
coul_lambdas             = 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
; We are not transforming any bonded or restrained interactions
bonded_lambdas           = 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
restraint_lambdas        = 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
; Masses are not changing (particle identities are the same at lambda = 0
and lambda = 1)
mass_lambdas             = 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
; Not doing simulated temperting here
temperature_lambdas      = 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
; Options for the decoupling
sc-alpha                 = 0.5
sc-coul                  = no       ; linear interpolation of Coulomb (none
in this case)
sc-power                 = 1
sc-sigma                 = 0.3
nstdhdl                  = 10
; Do not generate velocities
gen_vel                  = no
; options for bonds
constraints              = h-bonds  ; we only have C-H bonds here
; Type of constraint algorithm
constraint-algorithm     = lincs
; Constrain the starting configuration
; since we are continuing from NVT
continuation             = yes
; Highest order in the expansion of the constraint coupling matrix
lincs-order              = 12

 for running simulation I am using the command below.:

"gmx  mdrun -v -s MD.tpr -deffnm MD -nb gpu  -ntomp 10 -gpu_id 0 "

Any help in solving this issue is much appreciated

Thanking you in Advance

