Re: [gmx-users] Re: Umbrella sampling question

2012-11-16 Thread Erik Marklund
Hi,

Blindly defining the center of mass for a group of atoms is not possible in a 
periodic system such as a typical simulation box. You need some clue as to 
which periodic copy of every atom that is to be chosen. By providing 
pull_pbcatom0 you tell gromacs to, for every atom in grp0, use the periodic 
copy closest to the atom given by pull_pbcatom0. If you have large pullgroups 
this is necessary to define the inter-group distance in a way that makes sense. 
If you get different results depending on that setting you really need to 
figure out which atom is a good center for your calculations. The default 
behavior is to use the atom whose *index* is in the center of the group. If you 
for example have a dimeric protein this may correspond to the C-terminus of the 
first chain or the N-terminus of the second one, which in turn often doesn't 
coincide with the geometrical center of the group. I suggest you try yet 
another choice of pull_pbcatom0 that is also close to the center to see if that 
also give rise to a different distance. As mentioned, the choice of 
pull_pbcatom0 should not matter as long as the choice allows to figure out how 
to handle the periodicity.

Best,

Erik

15 nov 2012 kl. 19.56 skrev Gmx QA:

 Hi Chris
 
 Seems my confusion was that I assumed that the distances in the
 profile.xvg-file should correspond to something I could measure with
 g_dist. Turns out it does not.
 Thank you for helping me sorting out this, I got it now :-)
 
 About pull_pbcatom0 though. My box is  2*1.08 nm in all directions:
 $ tail -n 1 conf0.gro
  12.45770  12.45770  17.99590
 
 I am still not sure what pull_pbcatom0 does. You said it should not have
 any effect on my results, but changing it does result in a different
 initial distance reported by grompp.
 
 In my initial attempts at this, I did not specify anything for
 pull_pbcatom0, but in the grompp output I get this
 
 Pull group  natoms  pbc atom  distance at start reference at t=0
   0 21939 10970
   1 1 0   2.083 2.083
 Estimate for the relative computational load of the PME mesh part: 0.10
 This run will generate roughly 761 Mb of data
 
 
 Then, following the advice in the thread I referred to earlier, I set
 pull_pbcatom0 explicitly in the mdp-file to be an atom close to the COM of
 the Protein. Then I get from grompp
 
 Pull group  natoms  pbc atom  distance at start reference at t=0
   0 21939  7058
   1 1 0   1.808 1.808
 Estimate for the relative computational load of the PME mesh part: 0.10
 This run will generate roughly 761 Mb of data
 
 As you can see, the initial distance is different (2.083 vs 1.808), and
 1.808 is the same as the distance reported by g_dist. Do you have any
 comments here as to why this is?
 
 Thanks
 /PK
 
 
 What you reported is not what you did. It appears that grompp, gtraj, and
 g_dist report the same value.
 Please also report the value that you get from your pullx.xvg file that you
 get from mdrun, which I suspect
 will also be the same.
 
 The difference that you report is actually between the first FRAME of your
 trajectory from g_dist
 and the first LINE of the file from the g_wham output. I see no reason to
 assume that the values in the
 output of g_wham must be time-ordered. Also, I have never used g_wham
 myself (I use an external program
 to do wham) and so I can not say if you are using it correctly.
 
 My overall conclusion is that you need to investigate g_wham output not
 worry about a new run at this stage.
 
 Regarding pull_pbcatom0, there is lots of information on the mailing list
 about this. It is a global atom number
 that defines the unit cell for selection of which periodic image of each
 molecule will be used for the pulling.
 If all of your box dimensions are  2*1.08 nm, then pull_pbcatom0 will not
 affect your results.
 
 Chris.
 -- 
 gmx-users mailing listgmx-users@gromacs.org
 http://lists.gromacs.org/mailman/listinfo/gmx-users
 * Please search the archive at 
 http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
 * Please don't post (un)subscribe requests to the list. Use the 
 www interface or send it to gmx-users-requ...@gromacs.org.
 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

---
Erik Marklund, PhD
Dept. of Cell and Molecular Biology, Uppsala University.
Husargatan 3, Box 596,75124 Uppsala, Sweden
phone:+46 18 471 6688fax: +46 18 511 755
er...@xray.bmc.uu.se
http://www2.icm.uu.se/molbio/elflab/index.html

--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read 

Re: [gmx-users] Re: Umbrella sampling question

2012-11-16 Thread Gmx QA
Thanks Erik!

/PK

2012/11/16 Erik Marklund er...@xray.bmc.uu.se

 Hi,

 Blindly defining the center of mass for a group of atoms is not possible
 in a periodic system such as a typical simulation box. You need some clue
 as to which periodic copy of every atom that is to be chosen. By providing
 pull_pbcatom0 you tell gromacs to, for every atom in grp0, use the periodic
 copy closest to the atom given by pull_pbcatom0. If you have large
 pullgroups this is necessary to define the inter-group distance in a way
 that makes sense. If you get different results depending on that setting
 you really need to figure out which atom is a good center for your
 calculations. The default behavior is to use the atom whose *index* is in
 the center of the group. If you for example have a dimeric protein this may
 correspond to the C-terminus of the first chain or the N-terminus of the
 second one, which in turn often doesn't coincide with the geometrical
 center of the group. I suggest you try yet another choice of pull_pbcatom0
 that is also close to the center to see if that also give rise to a
 different distance. As mentioned, the choice of pull_pbcatom0 should not
 matter as long as the choice allows to figure out how to handle the
 periodicity.

 Best,

 Erik

 15 nov 2012 kl. 19.56 skrev Gmx QA:

  Hi Chris
 
  Seems my confusion was that I assumed that the distances in the
  profile.xvg-file should correspond to something I could measure with
  g_dist. Turns out it does not.
  Thank you for helping me sorting out this, I got it now :-)
 
  About pull_pbcatom0 though. My box is  2*1.08 nm in all directions:
  $ tail -n 1 conf0.gro
   12.45770  12.45770  17.99590
 
  I am still not sure what pull_pbcatom0 does. You said it should not have
  any effect on my results, but changing it does result in a different
  initial distance reported by grompp.
 
  In my initial attempts at this, I did not specify anything for
  pull_pbcatom0, but in the grompp output I get this
 
  Pull group  natoms  pbc atom  distance at start reference at t=0
0 21939 10970
1 1 0   2.083 2.083
  Estimate for the relative computational load of the PME mesh part: 0.10
  This run will generate roughly 761 Mb of data
 
 
  Then, following the advice in the thread I referred to earlier, I set
  pull_pbcatom0 explicitly in the mdp-file to be an atom close to the COM
 of
  the Protein. Then I get from grompp
 
  Pull group  natoms  pbc atom  distance at start reference at t=0
0 21939  7058
1 1 0   1.808 1.808
  Estimate for the relative computational load of the PME mesh part: 0.10
  This run will generate roughly 761 Mb of data
 
  As you can see, the initial distance is different (2.083 vs 1.808), and
  1.808 is the same as the distance reported by g_dist. Do you have any
  comments here as to why this is?
 
  Thanks
  /PK
 
 
  What you reported is not what you did. It appears that grompp, gtraj, and
  g_dist report the same value.
  Please also report the value that you get from your pullx.xvg file that
 you
  get from mdrun, which I suspect
  will also be the same.
 
  The difference that you report is actually between the first FRAME of
 your
  trajectory from g_dist
  and the first LINE of the file from the g_wham output. I see no reason to
  assume that the values in the
  output of g_wham must be time-ordered. Also, I have never used g_wham
  myself (I use an external program
  to do wham) and so I can not say if you are using it correctly.
 
  My overall conclusion is that you need to investigate g_wham output not
  worry about a new run at this stage.
 
  Regarding pull_pbcatom0, there is lots of information on the mailing list
  about this. It is a global atom number
  that defines the unit cell for selection of which periodic image of each
  molecule will be used for the pulling.
  If all of your box dimensions are  2*1.08 nm, then pull_pbcatom0 will
 not
  affect your results.
 
  Chris.
  --
  gmx-users mailing listgmx-users@gromacs.org
  http://lists.gromacs.org/mailman/listinfo/gmx-users
  * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
  * Please don't post (un)subscribe requests to the list. Use the
  www interface or send it to gmx-users-requ...@gromacs.org.
  * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 ---
 Erik Marklund, PhD
 Dept. of Cell and Molecular Biology, Uppsala University.
 Husargatan 3, Box 596,75124 Uppsala, Sweden
 phone:+46 18 471 6688fax: +46 18 511 755
 er...@xray.bmc.uu.se
 http://www2.icm.uu.se/molbio/elflab/index.html

 --
 gmx-users mailing listgmx-users@gromacs.org
 http://lists.gromacs.org/mailman/listinfo/gmx-users
 * Please search the archive at
 http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
 * Please don't post (un)subscribe 

Re: [gmx-users] Re: Umbrella sampling question

2012-11-15 Thread Gmx QA
Hi Erik,

Thank you for your answer.

I see your point now. Went and had a look in gmx_wham.c to see how things
are calculated, and that makes sense.

I was looking for an easy way of relating different parts of the resulting
PMF to my original starting frames, as a means to understand exactly what
the PMF is telling me, and I thought that by mapping distances in each
frame to entries in the profile.xvg-file, I could accomplish that. I see
that's not the case.

I am still learning Umbrella sampling and WHAM, and some of it I still do
not get fully. For example, how best (or easiest) to relate (a) events and
structures in individual frames (or in the pulling simulation) to (b) the
pull force vs time plot and (c) the PMF-plot.

Thanks
/PK

2012/11/14 Erik Marklund er...@xray.bmc.uu.se

 Hi,

 See below.

 14 nov 2012 kl. 15.06 skrev Gmx QA:

  Hi Chris,
 
  and thank you for your reply. I should have included my g_dist command in
  my first mail. Here goes:
 
  I first run trjconv to extract the individual frames from my pulling
  trajectory
  $ trjconv -f pull.xtc -s pull.tpr -o conf.gro -sep -pbc mol -ur compact
 
  Then, g_dist like so:
  $ g_dist -s pull.tpr -f conf0.gro -n index.ndx -o dist0.xvg
 
  where index.ndx is an index-file that contains both pull-groups (ie the
  Protein in one group, and another group with the single atom from the
  molecule I'm pulling). I've checked that is is the same as in the
 mdp-file
  for the pulling simulation.
 
  From this dist0.xvg-file I extract the z-component of the distance:
  $ tail -n 1 dist0.xvg | awk '{print $5}'
  -1.8083301
 
  And this is also the same value I get using g_traj to manually calculate
 it.
 
  Now, I run g_wham with this command line:
  $ g_wham -it tpr-files.dat -if pullf-files.dat -o -hist -unit kCal
 
  which gives me a profile.xvg file (per the default -o flag) containing
 the
  PMF.
  The top-part of the profile.xvg file looks like this:
 
  # This file was created Tue Nov  6 09:56:23 2012
  # by the following command:
  # g_wham -it tpr-files.dat -if pullf-files.dat -o -hist -unit kCal
  #
  # g_wham is part of G R O M A C S:
  #
  # Gromacs Runs On Most of All Computer Systems
  #
  @title Umbrella potential
  @xaxis  label z
  @yaxis  label E (kcal mol\S-1\N)
  @TYPE xy
  1.761971e+000.00e+00
  1.782558e+00-1.436057e-01
  1.803145e+00-3.857955e-01
 
  and I assumed that the distances reported here should be the same as the
  distances calculated by g_dist for each frame, but as you can see they
 are
  not (1.761971 vs 1.8083301)
 
  I have been trying to understand how the 1.761971 distance is calculated,
  but I do not get it since I use the pullf.xvg-files as input to g_wham.
 And
  even using the z-component of the distance from the pullx.xvg-file and
  calculating ie the average does not give me a value close to 1.761971.
 For
  the next couple of frames, there are also differences, but they are not
  constant.
 

 profile.xvg doesn't contain distances vs time, so 1.761971 has nothing to
 do with the first frame. Nor is it (very) close to the average distance,
 for the file is energy vs. your reaction coordinate. In your simulations
 1.761971 is the lowest distance ever encountered and g_wham sets that as a
 reference point for the rest of the free energy profile, hence the 0 in the
 next column. If you inspect the stdout from g_wham you may see how this is
 the case.

 Best,

 Erik

 
  I should add that after having sent my first mail, I found this thread:
 
 http://gromacs.5086.n6.nabble.com/umbrella-sampling-PMF-position-discrepancy-td5000886.html
  which fairly well describes my problem. Following the suggestion to
 change
  (or in my case add, it was not there originally) the pull_pbcatom0 entry
 to
  the mdp-file as an atom that is close to the COM of the protein, gives
 me a
  distance of 1.809 reported from running grompp (when making the tpr for
 the
  pulling simulation). I've started another pulling run with this new tpr,
  but I don't like making changes I don't understand. I guess I'm not clear
  about what pull_pbcatom0 does, but I though the distance in this case was
  rather unambiguous.
 
  Thanks
  /PK
 
 
  Can you please let us know exactly how you got the two values that you
 find
  to be different (but expected
  to be the same)? i.e. post your full g_dist command and explain how you
  observed the value in the output from
  mdrun. One frame should be enough for now (as long as you are sure --
 and
  can prove to us -- that it is
  indeed the same frame).
 
  You don't define profile.xvg and I wonder if the discrepancy might be
  related to your use of pull_start=yes
  Do you see a consistent bias between mdrun output and g_dist output if
 you
  subtract the values for
  different time frames? If so, is this difference the same as the
 distance
  in the initial frame (pull_start=yes)?
 
  Chris.
 
  -- original message --
 
  I'm a new poster on the maillist, and new to 

Re: [gmx-users] Re: Umbrella sampling question

2012-11-14 Thread Erik Marklund
Hi,

See below.

14 nov 2012 kl. 15.06 skrev Gmx QA:

 Hi Chris,
 
 and thank you for your reply. I should have included my g_dist command in
 my first mail. Here goes:
 
 I first run trjconv to extract the individual frames from my pulling
 trajectory
 $ trjconv -f pull.xtc -s pull.tpr -o conf.gro -sep -pbc mol -ur compact
 
 Then, g_dist like so:
 $ g_dist -s pull.tpr -f conf0.gro -n index.ndx -o dist0.xvg
 
 where index.ndx is an index-file that contains both pull-groups (ie the
 Protein in one group, and another group with the single atom from the
 molecule I'm pulling). I've checked that is is the same as in the mdp-file
 for the pulling simulation.
 
 From this dist0.xvg-file I extract the z-component of the distance:
 $ tail -n 1 dist0.xvg | awk '{print $5}'
 -1.8083301
 
 And this is also the same value I get using g_traj to manually calculate it.
 
 Now, I run g_wham with this command line:
 $ g_wham -it tpr-files.dat -if pullf-files.dat -o -hist -unit kCal
 
 which gives me a profile.xvg file (per the default -o flag) containing the
 PMF.
 The top-part of the profile.xvg file looks like this:
 
 # This file was created Tue Nov  6 09:56:23 2012
 # by the following command:
 # g_wham -it tpr-files.dat -if pullf-files.dat -o -hist -unit kCal
 #
 # g_wham is part of G R O M A C S:
 #
 # Gromacs Runs On Most of All Computer Systems
 #
 @title Umbrella potential
 @xaxis  label z
 @yaxis  label E (kcal mol\S-1\N)
 @TYPE xy
 1.761971e+000.00e+00
 1.782558e+00-1.436057e-01
 1.803145e+00-3.857955e-01
 
 and I assumed that the distances reported here should be the same as the
 distances calculated by g_dist for each frame, but as you can see they are
 not (1.761971 vs 1.8083301)
 
 I have been trying to understand how the 1.761971 distance is calculated,
 but I do not get it since I use the pullf.xvg-files as input to g_wham. And
 even using the z-component of the distance from the pullx.xvg-file and
 calculating ie the average does not give me a value close to 1.761971. For
 the next couple of frames, there are also differences, but they are not
 constant.
 

profile.xvg doesn't contain distances vs time, so 1.761971 has nothing to do 
with the first frame. Nor is it (very) close to the average distance, for the 
file is energy vs. your reaction coordinate. In your simulations 1.761971 is 
the lowest distance ever encountered and g_wham sets that as a reference point 
for the rest of the free energy profile, hence the 0 in the next column. If you 
inspect the stdout from g_wham you may see how this is the case.

Best,

Erik

 
 I should add that after having sent my first mail, I found this thread:
 http://gromacs.5086.n6.nabble.com/umbrella-sampling-PMF-position-discrepancy-td5000886.html
 which fairly well describes my problem. Following the suggestion to change
 (or in my case add, it was not there originally) the pull_pbcatom0 entry to
 the mdp-file as an atom that is close to the COM of the protein, gives me a
 distance of 1.809 reported from running grompp (when making the tpr for the
 pulling simulation). I've started another pulling run with this new tpr,
 but I don't like making changes I don't understand. I guess I'm not clear
 about what pull_pbcatom0 does, but I though the distance in this case was
 rather unambiguous.
 
 Thanks
 /PK
 
 
 Can you please let us know exactly how you got the two values that you find
 to be different (but expected
 to be the same)? i.e. post your full g_dist command and explain how you
 observed the value in the output from
 mdrun. One frame should be enough for now (as long as you are sure -- and
 can prove to us -- that it is
 indeed the same frame).
 
 You don't define profile.xvg and I wonder if the discrepancy might be
 related to your use of pull_start=yes
 Do you see a consistent bias between mdrun output and g_dist output if you
 subtract the values for
 different time frames? If so, is this difference the same as the distance
 in the initial frame (pull_start=yes)?
 
 Chris.
 
 -- original message --
 
 I'm a new poster on the maillist, and new to umbrella sampling but not to
 MD in general.
 
 I have recently done some pulling simulation with Gromacs, but have a few
 questions about the outcome, and in particular the distances that are
 calculated for the different windows. I realize this is a question that has
 come up before, and there are some useful posts in the archives, but
 nothing that exactly answers my questions.
 
 To begin, I ran a pulling-simulation pulling a molecule starting in the
 middle of a membrane (and inside a protein transport channel) straight up
 in z. To that end, I used the following mdp-settings:
 
 pull = umbrella
 pull_geometry = direction
 pull_vec1 = 0 0 1
 pull_ngroups = 1
 pull_start = yes
 pull_group0 = Protein
 pull_group1 = pg1
 pull_rate1 = 0.001
 pull_k1 = 500
 
 So my pull groups are the COM of the protein, and one atom named pg1 on the
 pull molecule.
 To my understanding, this should