Re: [ccp4bb] Deposition of riding H
On Sat, 2012-05-12 at 19:28 +0100, Yuri Pompeu wrote: Dear community, I am probably disturbing a sleeping bear definitely so Reading the thread on hydrogen deposition with the model, I came accross several arguments that make sense on their own, but when put together are puzzling and dont seem to converge to an answer. this is true for most other recurring threads -Some argued that depositing riding hydrogens with the model may imply that your data had enough information for you to include hydrogen atoms in the final model. Unless there is a remark that explicitly states that these are riding hydrogens, but nobody reads those This is definetely a problem especially when dealing with non-experienced users that may think the model is more accurate than it really is. they always do and they have no idea how accurate the model actually is -It seemed to be consensus, that when softwares use hydrogen restraints it can be beneficial geometrically and also can make your model a better description of your x-ray data. I think nobody disputes that, although the benefit may vary from structure to structure Based on these two main arguments, many would agree that hydrogens should be included throughout refinement but not deposited. I do agree and I won't deposit them myself, but then what others choose to deposit is really their choice. So this brings me to last point that was also mentioned in the old thread. If you used riding hydrogens throughout refinement and arrived at a final model that you believe best describes your x-ray data to a certain level of accuracy (Rvalues, geometry, map CC, etc...) would you not be invalidating the whole refinement process by going in and removing the hydrogen atoms right before deposition? Not really. You report that you used riding hydrogens and you report the program you used to generate them. In theory, anyone can dig up the appropriate version and reproduce your results. So how would one avoid this Catch-22? I don't think it's strictly a catch-22 situation. The issue is that depending on what the structural model is used for, different forms of the pdb file may become most useful. The only situation I can imagine when having riding hydrogens is beneficial is for algorithm development and perhaps verification of how much differences in riding hydrogen treatment contribute to differences in things like R-values. Both are quite esoteric tasks and you already provide sufficient information (vide supra). Cheers, Ed. -- Hurry up before we all come back to our senses! Julian, King of Lemurs
Re: [ccp4bb] Deposition of riding H
On Saturday, 12 May 2012, Yuri Pompeu wrote: If you used riding hydrogens throughout refinement and arrived at a final model that you believe best describes your x-ray data to a certain level of accuracy (Rvalues, geometry, map CC, etc...) would you not be invalidating the whole refinement process by going in and removing the hydrogen atoms right before deposition? My view: You are not removing hydrogen atoms at all. You are stating that the model being deposited includes riding hydrodens. The consumer of your model can regenerate the individual hydrogen coordinates from that information if needed, just as refmac does when you start a new refinement cycle with the riding hydrogen model selected. You don't need to output the individual hydrogen coordinates between cycles, or at deposition time, because they are adequately described by the riding hydrogen model. You might as well ask why do we remove all copies of the molecules in the crystal except for those in a single asymmetric unit? They are not really removed; they are implicit in the statement of the crystallographic symmetry. Ethan
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Hi Nicholas, Thank you for your reply. snip it seems that we are trying to deposit one model to satisfy two different purposes - one for model validation and the other for model interpretation (use in docking etc), and what's good for one purpose might not be necessarily good for the other. /snip This has been discussed before on this list, but allow me to repeat it: You would have expected that the crystallographers' aim would be to deposit the model that maximises the product (likelihood * prior). Clearly, this is not what we do, mainly because (a) the calculation of likelihood is only based on a subset of the 'data' that are obtained from an X-ray diffraction experiment (for example, we ignore diffuse scattering as Ian pointed-out), (b) we consciously avoid 'prior' because this would make the models 'subjective', meaning that better informed people would deposit (for the same data) different models than the less well informed, (c) the format of the PDB does not offer much room for 'creative interpretations' of the electron density maps [for example, you can't have discrete disorder on the backbone (or has this changed ?)]. I sense that what is being deposited is not the 'best model' in any conceivable way, but the model that 'best' accounts for the final 2mFo-DFc map within the limitations of the program used for the final refinement. I don't quite understand your point. We currently deposit electron densities and movies, I don't see how depositing an energy minimized structure is so difficult. It doesn't need to be on the same pdb file as the model used in refinement nor does it need to be deposited into the PDB server, but even if it does, is it not possible to have it as a new Chain or new atom type in the current pdb file format? ps. May I say parenthetically that making the deposited models dependant on their intended usage, would possibly qualify as 'fraud' ;-) I don't quite understand this either. When I prepare a protein model for simulation, I would remove all alternative conformations, add hydrogens, and then minimize the structure. If I make such a minimized structure available for others to use with full disclosure, how would that constitute fraud? I was going to start offering minimized models on our future structures on our lab website, but if that constitutes fraud, then I might have to rethink. I don't know enough to argue with anyone here and that's not the intention of my posts - I am just trying to help figure out a way to resolve a significant problem that will likely to resurface down the road. It would be helpful if the more experienced people here can start a discussion of 'how to resolve' the problems exposed by this thread so far - assuming that you agree that it's a problem worth your time. Cheers, Quyen __ Quyen Hoang, Ph.D Assistant Professor Department of Biochemistry and Molecular Biology, Stark Neurosciences Research Institute Indiana University School of Medicine 635 Barnhill Drive, Room MS0013D Indianapolis, Indiana 46202-5122 Phone: 317-274-4371 Fax: 317-274-4686 email: qqho...@iupui.edu -- Dr Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Hi Ethan, mainly because (a) the calculation of likelihood is only based on a subset of the 'data' that are obtained from an X-ray diffraction experiment (for example, we ignore diffuse scattering as Ian pointed-out), I do not think that is a valid criticism. In any field of science one might hypothesize that conducting a different kind of experiment and fitting it in accordance with a different theory would produce a different model. But that is only a hypothetical; it does not invalidate the analysis of the experiment you did do based on the data you did collect. For the example I mentioned (diffuse scattering), the experiment would be identical. Although using only subset of the available information may not invalidate the analysis performed, still it is not the best that can be done with the data in hand. (b) we consciously avoid 'prior' because this would make the models 'subjective', meaning that better informed people would deposit (for the same data) different models than the less well informed, I don't know of anyone who consciously avoids using their prior knowledge to inform their current work. But yes, people with more experience may in the end deposit better models than people with little experience. That's why it is valuable to have automated tools like Molprobity to check a proposed model against established prior expectations. It's also one way this bulletin board is value, because it allows those with less experience to ask advice from those with more experience. Most people would like to think that the models they deposit correspond to an 'objective' representation of the experimentally accessible physical reality. The validation tools, mainly by enforcing a uniformity of interpretation, discourage (and not encourage) the incorporation in the model of prior knowledge about the problem at hand, and thus, offer to their users the safety of an 'objectively validated model'. (c) the format of the PDB does not offer much room for 'creative interpretations' of the electron density maps [for example, you can't have discrete disorder on the backbone (or has this changed ?)]. Could you expand on this point? I am not aware of any restriction on multiple backbone conformations, now or ever. It is true that our refinement programs have not always been very well suited to refine such a model, but that is not a fault of the PDB format. I stand corrected on that. It was probably just me :-) I sense that what is being deposited is not the 'best model' in any conceivable way, but the model that 'best' accounts for the final 2mFo-DFc map within the limitations of the program used for the final refinement. That would be true if the refinement is conducted in real space. However, it is nearly universal to do the final refinement in reciprocal space. The emphasis of what I said was clearly on model building, and not on the refinement methodology. The reference to the refinement program was again model-centric (ranging from the treatment of hydrogens, to the bulk solvent model used). Best regards, Nicholas -- Dr Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
snip it seems that we are trying to deposit one model to satisfy two different purposes - one for model validation and the other for model interpretation (use in docking etc), and what's good for one purpose might not be necessarily good for the other. /snip This has been discussed before on this list, but allow me to repeat it: You would have expected that the crystallographers' aim would be to deposit the model that maximises the product (likelihood * prior). Clearly, this is not what we do, mainly because (a) the calculation of likelihood is only based on a subset of the 'data' that are obtained from an X-ray diffraction experiment (for example, we ignore diffuse scattering as Ian pointed-out), (b) we consciously avoid 'prior' because this would make the models 'subjective', meaning that better informed people would deposit (for the same data) different models than the less well informed, (c) the format of the PDB does not offer much room for 'creative interpretations' of the electron density maps [for example, you can't have discrete disorder on the backbone (or has this changed ?)]. I sense that what is being deposited is not the 'best model' in any conceivable way, but the model that 'best' accounts for the final 2mFo-DFc map within the limitations of the program used for the final refinement. My twocents, Nicholas ps. May I say parenthetically that making the deposited models dependant on their intended usage, would possibly qualify as 'fraud' ;-) -- Dr Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Saturday 18 September 2010, Nicholas M Glykos wrote: snip it seems that we are trying to deposit one model to satisfy two different purposes - one for model validation and the other for model interpretation (use in docking etc), and what's good for one purpose might not be necessarily good for the other. /snip This has been discussed before on this list, but allow me to repeat it: You would have expected that the crystallographers' aim would be to deposit the model that maximises the product (likelihood * prior). Clearly, this is not what we do, I guess I have more faith that we do in fact aim for that. Our data, programs, models, and insight are imperfect, but we do our best with what we have. mainly because (a) the calculation of likelihood is only based on a subset of the 'data' that are obtained from an X-ray diffraction experiment (for example, we ignore diffuse scattering as Ian pointed-out), I do not think that is a valid criticism. In any field of science one might hypothesize that conducting a different kind of experiment and fitting it in accordance with a different theory would produce a different model. But that is only a hypothetical; it does not invalidate the analysis of the experiment you did do based on the data you did collect. (b) we consciously avoid 'prior' because this would make the models 'subjective', meaning that better informed people would deposit (for the same data) different models than the less well informed, I don't know of anyone who consciously avoids using their prior knowledge to inform their current work. But yes, people with more experience may in the end deposit better models than people with little experience. That's why it is valuable to have automated tools like Molprobity to check a proposed model against established prior expectations. It's also one way this bulletin board is value, because it allows those with less experience to ask advice from those with more experience. (c) the format of the PDB does not offer much room for 'creative interpretations' of the electron density maps [for example, you can't have discrete disorder on the backbone (or has this changed ?)]. Could you expand on this point? I am not aware of any restriction on multiple backbone conformations, now or ever. It is true that our refinement programs have not always been very well suited to refine such a model, but that is not a fault of the PDB format. I sense that what is being deposited is not the 'best model' in any conceivable way, but the model that 'best' accounts for the final 2mFo-DFc map within the limitations of the program used for the final refinement. That would be true if the refinement is conducted in real space. However, it is nearly universal to do the final refinement in reciprocal space. If a maximum likelihood residual is used, the aim is to achieve the best model in the generally accepted formal sense of being the the set of model parameter values that provide the most likely explanation for the observed data. The priors are imposed as restraints; the partial residual R_crystallographic(Fo, Fc) encompasses the agreement with the observed data. My twocents, Nicholas And mine in return :-) Ethan
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Hi Pavel, Am 16.09.10 17:56, schrieb Pavel Afonine: Hi Dirk, so, wouldn't be the deposition of the final model's Fcalc, Phic (and their weights) along with the final coordinates be the best solution? The final Fcalc are our best model and can be used to reproduce the final statistics (which would remove the sfcheck annoyance) and to reproduce the final electron density maps, and the coordinates can be used for what ever purpose they are needed, irrespective of adding riding hydrogens or not. it is a great idea and if you look in PDB deposited structure factors there is a number of them (but certainly not the majority) that are accompanied by Fcalc. However, a few things to keep in mind: - Imagine a (not very uncommon, unfortunately) situation when someone obtains the final model and Fcalc, and then, right before the PDB deposition does a final check in Coot, and moves/removes a few atoms (a few waters, or instance) here and there. Or may be does a real-space fit of a residue. Or removes H, if present. Or renames a ligand by request of PDB staff and accidentally change an atom parameter(s). All this in turn will invalidate the R-factors and make previously calculated Fcalc inconsistent with such a manipulated model. So, the bottom-line is: having a model that you can use to reproduce the reported statistics is important (for validation and database sanity at least, if someones believe that such a minor things wouldn't impair the biological interpretation - ultimate goal of protein structures). but this is exactly what one shouldn't do: manipulate the structure after the final refinement! And if you manipulate it for a good reason, do a last final refinement after that, before depositing coordinates and structure factors. Then, there will be no problems, as far as I can see. Best regards, Dirk -- *** Dirk Kostrewa Gene Center Munich, A5.07 Department of Biochemistry Ludwig-Maximilians-Universität München Feodor-Lynen-Str. 25 D-81377 Munich Germany Phone: +49-89-2180-76845 Fax:+49-89-2180-76999 E-mail: kostr...@genzentrum.lmu.de WWW:www.genzentrum.lmu.de ***
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Dirk, - Imagine a (not very uncommon, unfortunately) situation when someone obtains the final model and Fcalc, and then, right before the PDB deposition does a final check in Coot, and moves/removes a few atoms (a few waters, or instance) here and there. Or may be does a real-space fit of a residue. Or removes H, if present. Or renames a ligand by request of PDB staff and accidentally change an atom parameter(s). All this in turn will invalidate the R-factors and make previously calculated Fcalc inconsistent with such a manipulated model. So, the bottom-line is: having a model that you can use to reproduce the reported statistics is important (for validation and database sanity at least, if someones believe that such a minor things wouldn't impair the biological interpretation - ultimate goal of protein structures). but this is exactly what one shouldn't do: manipulate the structure after the final refinement! And if you manipulate it for a good reason, do a last final refinement after that, before depositing coordinates and structure factors. Then, there will be no problems, as far as I can see. I apology if what I wrote doesn't read clearly - this is exactly what I'm saying: in this particular reply and across the whole discussion. Note, I used the word unfortunately above. Anyway, saying it again: What I mentioned is based on my (and not only my - see relevant papers) observation running validation tools through the whole PDB and making note of such manipulated structure. It is a matter of fact that there are some intentionally or unintentionally manipulated models, it is very bad, it is unfortunate and obviously I'm strictly against it. I'm against it to a such a degree so even didn't bother to write a paper on this matter, which I mentioned on this thread already: J. Appl. Cryst. 2010, 43, 669-67. Therefore it is important to have a model that you can use to reproduce the reported statistics (for validation, at least), although having Fcalc around wouldn't hurt. Sorry again, if I wasn't clear in my previous reply. All the best! Pavel.
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
As a relatively inexperienced scientist, I find this discussion fascinating. I wonder if NMR and EM people are also worried about depositing enough modeled info to allow back calculation of data. Regarding the original discussion of whether to deposit riding hydrogens used in the refinement, it seems that we are trying to deposit one model to satisfy two different purposes - one for model validation and the other for model interpretation (use in docking etc), and what's good for one purpose might not be necessarily good for the other. I wonder if it would help to deposit two different models; one precisely reflects the model used in refinement and the other an energy minimized model with predicted hydrogens and alternative conformations removed? Cheers, Quyen __ Quyen Hoang, Ph.D Assistant Professor Department of Biochemistry and Molecular Biology, Stark Neurosciences Research Institute Indiana University School of Medicine 635 Barnhill Drive, Room MS0013D Indianapolis, Indiana 46202-5122 Phone: 317-274-4371 Fax: 317-274-4686 email: qqho...@iupui.edu On Sep 17, 2010, at 8:28 AM, Ian Tickle wrote: Oh, goodness, I see: even here, we would need clear rules what the calculated structure factors are, which weights are were, which bulk solvent correction was applied ... a maze, too! Fortunately the X-ray restraint weights/target values are not an issue here: varying them changes the refined model parameters of course, but they do not appear in the structure factor formula, so don't need to be specified in the mathematical model to obtain the Fcalcs. You would of course need to know all the weights target values (as well as the SF formula) to reproduce the refinement to get the deposited model. But could future programs really re-calculate the same structure factors from the deposited model? Because of the expected development of more advanced methods and algorithms, I have my doubts ... *sigh* Yes, if the deposited mathematical model is completely specified in terms of the SF formula used and the values of *all* the parameters that go into it, then in principle future versions of software using more advanced models will be able to reproduce the exact Fcalcs. This assumes that the advanced models will use the same 'core' formula but with additional terms and adjustable parameters, so that the simple model can be obtained from the advanced one by constraining the extra parameters to fixed values. However if the simple model is not 'nested' inside the more advanced model in this way, then no it will not be possible to reproduce the Fcalcs. However as I implied, the main issue is that we're rather lax at fully specifying our models (both formulae parameters): obviously if in future you don't have all the information you need to reproduce the calculation then you have no hope of getting the same Fcalcs! Cheers -- Ian
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Very interesting discussion. I wonder if the inexperienced user of PDB really exists? I don't know anyone off-hand who would really make use of information from hydrogen positions but not understand the issues. Although I hear they have been sighted in the Everglades http://en.wikipedia.org/wiki/Skunk_ape Kendall
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Dear Ian and contributors to this interesting thread, (please, scroll down a little bit) Am 15.09.10 23:34, schrieb Ian Tickle: I should just like to point out that the main source of the disagreement here seems to be that people have very different ideas about what a 'model' is or should be. Strictly a model is a purely mathematical construct, in this case it consists of the appropriate equation for the calculated structure factor and the best-fit values of the various parameters (scattering factors, atomic positions, occupancies, B factors, TLS parameters etc.) that appear in it. A mathematical model is inevitably going to be an imperfect representation of reality, but hopefully it's the best one we can come up with, in the sense of best explaining the data without significant overfitting. The problem arises because many users of the PDB, and I suspect many contributors to this BB, particularly non-crystallographers, don't see it like that, because they view a PDB file as a physical model, i.e. not as the best fit to the data (assuming that the non-crystallographers even know what the data are!), but the closest representation of reality. The difference between the N-H bond lengths that Ed referred to illustrates the distinction between the mathematical and the physical model. The mathematical model requires that the bond length is 0.86 Ang because that value gives the best fit of the assumed spherical scattering factor of H to the deformation density of the X-H covalent bond. The physical model requires that it be 1.00 Ang because that is the internuclear distance found by spectroscopic methods predicted by QM calculations. The same goes for B factors and TLS: to a large extent they are a mathematical construct whose purpose is to provide an optimal fit to the data. The connection of Bs TLS with reality is tenuous at best, nevertheless people obviously would like to have a physical interpretation such as rigid-body correlated motion. The fact that Bragg scattering provides no information about correlated motion (you need to measure the diffuse scattering for that) doesn't seem to deter them! I have no doubt in my mind that it is the mathematical model that should be published, because hopefully it's the best available interpretation of the data. Whether that involves publishing the riding H atoms explicitly, or alternatively the formulae and parameters that were used to calculate their positions I don't mind, as long as I can faithfully reproduce the Fcalcs to check the validity of the model. Then users of the PDB are free to *interpret* the mathematical models as physical models in a appropriate manner (e.g. by adjusting the bond lengths to H), and crystallographers have the untainted mathematical models needed to reproduce the Fcalcs. so, wouldn't be the deposition of the final model's Fcalc, Phic (and their weights) along with the final coordinates be the best solution? The final Fcalc are our best model and can be used to reproduce the final statistics (which would remove the sfcheck annoyance) and to reproduce the final electron density maps, and the coordinates can be used for what ever purpose they are needed, irrespective of adding riding hydrogens or not. Best regards, Dirk. -- *** Dirk Kostrewa Gene Center Munich, A5.07 Department of Biochemistry Ludwig-Maximilians-Universität München Feodor-Lynen-Str. 25 D-81377 Munich Germany Phone: +49-89-2180-76845 Fax:+49-89-2180-76999 E-mail: kostr...@genzentrum.lmu.de WWW:www.genzentrum.lmu.de ***
Re: [ccp4bb] Deposition of riding H
Hi Pavel, Note that in the ultra-high resolution structure of aldose reductase http://www.ncbi.nlm.nih.gov/pubmed/15146478 we didn't see all (or most) hydrogens. So, the converse question one could ask is why we didn't see all of them? Was it only because of higher B-factors or because some of them were stripped during data collection? Note that in my original message I said they are, in most cases, still assumed. Ultra-high resolution structures are exactly what I meant under few cases when some of the positions are not assumed, so thanks for pointing that out. It's not all or nothing - some hydrogens will be stripped and some won't. But since we don't know which ones are gone, depositing the coordinates of all of them may be misleading. It can be particularly dangerous for structure-based functional interpretations because several publications suggest that active sites are one of the first ones to suffer from radiation damage. And aren't the functional interpretations the ultimate goal of protein structures? Cheers, N. From: Pavel Afonine [mailto:pafon...@lbl.gov] Sent: Wed 9/15/2010 5:56 PM To: Sanishvili, Ruslan Cc: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Deposition of riding H Hi Nukri, thanks for the paper (I haven't read the paper yet), I definitely missed this one! Interesting though, if we assume that they get stripped off during data collection, how you could see so many hydrogen atoms in Fo-Fc residual maps for Aldose Reductase structure at 0.66A? B. Guillot, C. Jelsch, N. Muzet, C. Lecomte, E. Howard, B. Chevrier, A. Mitschler, A. Podjarny, A. Cousson, R. Sanishvili A. Joachimiak (2000). Multipolar refinement of aldose reductase at subatomic resolution. Acta Cryst. A56, s199. E. I. Howard, R. Cachau, A. Mitschler, P. Barth, B. Chevrier, V. Lamour, A. Joachimiak, R. Sanishvili, M. Van Zandt, D. Moras A. Podjarny (2000). Crystallization of Aldose Reductase leading to Single Wavelength (0.66 Å) and MAD (0.9 Å) subatomic resolution studies. Acta Cryst. A56, s57. A. D. Podjarny, A. Mitschler, I. Hazemann, T. Petrova, F. Ruiz, E. Howard, C. Darmanin, R. Chung, T. R. Schneider, R. Sanishvili, C. Schulze-Briesse, T. Tomizaki, M. Van Zandt, M. Oka, A. Joachimiak O. El-Kabbani (2005). Inhibitor binding to aldose reductase studied at subatomic resolution. Acta Cryst. A61, c122. Pavel. On 9/15/10 3:34 PM, Sanishvili, Ruslan wrote: Hi All, I have not read all messages in the trace so my apologies if somebody already pointed out what I have to say. There is lot of talk about how this or that software treats the riding hydrogens. What to do with the fact that however these hydrogens are treated in calculations, they are, in most cases, still assumed? Meents et al http://scripts.iucr.org/cgi-bin/paper?xh0004 showed that proteins are stripped of hydrogens during X-ray data collection. So, IMHO it is a good argument against depositing the H coordinates in PDB. Cheers, N. Ruslan Sanishvili (Nukri), Ph.D. GM/CA-CAT Biosciences Division, ANL 9700 S. Cass Ave. Argonne, IL 60439 Tel: (630)252-0665 Fax: (630)252-0667 rsanishv...@anl.gov -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of George M. Sheldrick Sent: Wednesday, September 15, 2010 5:14 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Deposition of riding H Pavel: In my original email I very carefully gave credit for IMPLEMENTING the TLS concept. Of course the ideas and some programs had been around long before, but it was the IMPLEMENTATION IN REFMAC that resulted in TLS becoming widely used. I had actually considered putting it into SHELXL but had not done so for two reasons (a) I was too lazy and (b) I missed an essential trick that REFMAC introduced, namely the combination of TLS with an additive isotropic B-value for each atom. Dale: You are quite correct that AFIX 137 breaks my argument about not depositing (SHELX) hydrogen atoms because they can be recalculated with no loss of experimental information. However to be fair, if you generate the first .ins file using SHELXPRO (the recommended procedure) you will get AFIX 33 that doesn't have this problem. For Pavel and others unfamiliar with SHELXL, AFIX 33 is a pure riding model with a staggered methyl group but AFIX 137 assumes local threefold symmetry, finds the initial torsion angle by a three-fold averaged fit to the difference density and then refines the torsion angle in the following cycles. Since this torsion angle is not given explicitly in the output files, if AFIX 137 hydrogens are not deposited, they cannot be regenerated except by a full refinement against the experimental data. George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582 On Wed, 15 Sep 2010
Re: [ccp4bb] Deposition of riding H- Are they or are they not? Additional experiments are needed
Well , maybe they are there (hydrogens), maybe they are not (also depends on location). They, or something else also boils sometimes. I also understand from some other publications such as doi:10.1107/S090904509002192 (cyclosporine) that hydrogen abstraction is irreversible. Is it supported my Mass Spectroscopy post mortem in the case of cyclosporine and aldose reductase? Just what left from the irradiated crystals - molecules with or without hydrogens can be checked in mass-spectrometer. BTW, part of my early life I practiced small molecule X-ray crystallography, which is by definition ultra-high resolution. When we wished to know in critical cases were hydrogens are and if they are, we exchanged them with deuterium in large crystals and performed neutron diffraction. One major advantage of neutron diffraction over X-ray diffraction is that the latter is rather insensitive to the presence of hydrogen (H) in a structure, whereas the nuclei 1H and 2H (i.e. Deuterium, D) are strong scatterers for neutrons. This means that the position of deuterium in a crystal structure and its thermal motions can be determined far more precisely with neutrons Dr Felix Frolow Professor of Structural Biology and Biotechnology Department of Molecular Microbiology and Biotechnology Tel Aviv University 69978, Israel Acta Crystallographica F, co-editor e-mail: mbfro...@post.tau.ac.il Tel: ++972-3640-8723 Fax: ++972-3640-9407 Cellular: 0547 459 608 On Sep 16, 2010, at 15:45 , Sanishvili, Ruslan wrote: Hi Pavel, Note that in the ultra-high resolution structure of aldose reductase http://www.ncbi.nlm.nih.gov/pubmed/15146478 we didn't see all (or most) hydrogens. So, the converse question one could ask is why we didn't see all of them? Was it only because of higher B-factors or because some of them were stripped during data collection? Note that in my original message I said they are, in most cases, still assumed. Ultra-high resolution structures are exactly what I meant under few cases when some of the positions are not assumed, so thanks for pointing that out. It's not all or nothing - some hydrogens will be stripped and some won't. But since we don't know which ones are gone, depositing the coordinates of all of them may be misleading. It can be particularly dangerous for structure-based functional interpretations because several publications suggest that active sites are one of the first ones to suffer from radiation damage. And aren't the functional interpretations the ultimate goal of protein structures? Cheers, N. From: Pavel Afonine [mailto:pafon...@lbl.gov] Sent: Wed 9/15/2010 5:56 PM To: Sanishvili, Ruslan Cc: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Deposition of riding H Hi Nukri, thanks for the paper (I haven't read the paper yet), I definitely missed this one! Interesting though, if we assume that they get stripped off during data collection, how you could see so many hydrogen atoms in Fo-Fc residual maps for Aldose Reductase structure at 0.66A? B. Guillot, C. Jelsch, N. Muzet, C. Lecomte, E. Howard, B. Chevrier, A. Mitschler, A. Podjarny, A. Cousson, R. Sanishvili A. Joachimiak (2000). Multipolar refinement of aldose reductase at subatomic resolution. Acta Cryst. A56, s199. E. I. Howard, R. Cachau, A. Mitschler, P. Barth, B. Chevrier, V. Lamour, A. Joachimiak, R. Sanishvili, M. Van Zandt, D. Moras A. Podjarny (2000). Crystallization of Aldose Reductase leading to Single Wavelength (0.66 Å) and MAD (0.9 Å) subatomic resolution studies. Acta Cryst. A56, s57. A. D. Podjarny, A. Mitschler, I. Hazemann, T. Petrova, F. Ruiz, E. Howard, C. Darmanin, R. Chung, T. R. Schneider, R. Sanishvili, C. Schulze-Briesse, T. Tomizaki, M. Van Zandt, M. Oka, A. Joachimiak O. El-Kabbani (2005). Inhibitor binding to aldose reductase studied at subatomic resolution. Acta Cryst. A61, c122. Pavel. On 9/15/10 3:34 PM, Sanishvili, Ruslan wrote: Hi All, I have not read all messages in the trace so my apologies if somebody already pointed out what I have to say. There is lot of talk about how this or that software treats the riding hydrogens. What to do with the fact that however these hydrogens are treated in calculations, they are, in most cases, still assumed? Meents et al http://scripts.iucr.org/cgi-bin/paper?xh0004 showed that proteins are stripped of hydrogens during X-ray data collection. So, IMHO it is a good argument against depositing the H coordinates in PDB. Cheers, N. Ruslan Sanishvili (Nukri), Ph.D. GM/CA-CAT Biosciences Division, ANL 9700 S. Cass Ave. Argonne, IL 60439 Tel: (630)252-0665 Fax: (630)252-0667 rsanishv...@anl.gov -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of George M. Sheldrick Sent: Wednesday, September 15, 2010 5:14 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb
Re: [ccp4bb] Deposition of riding H
Hi Nukri, Note that in the ultra-high resolution structure of aldose reductase http://www.ncbi.nlm.nih.gov/pubmed/15146478 we didn't see all (or most) hydrogens. So, the converse question one could ask is why we didn't see all of them? Was it only because of higher B-factors or because some of them were stripped during data collection? yes, we saw ~54% of them - I used to work on this at some point too ( Blakeley MP, Ruiz F, Cachau R, Hazemann I, Meilleur F, Mitschler A, Ginell S, Afonine P, Ventura ON, Cousido-Siah A, et al. Quantum model of catalysis based on a mobile proton revealed by subatomic x-ray and neutron diffraction studies of h-aldose reductase. Proc Natl Acad Sci U S A.2008;105(6):1844--1848.) My impression at that point was that we did not see the rest partially because the model was not good enough (in terms of seeing fine details). What I mean is that improving model from R-factor~10 to R~9% resulted in adding ~10% more visible H atoms. When I then refined the model down to ~7% using Interatomic Scatterers model (to account for deformation density) the amount of observable H atoms increased from published 54% up to ~68% or so (writing from memory). So, hypothetically, I guess, if we could refine it down to some lower R-factor we then would see even more H atoms (and the rest, if we finally don't see them - would probably be those that gone). The resolution and B-factors are necessary but not enough to see H atoms - the overall noise level is a key too. All the best! Pavel.
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Thursday 16 September 2010 01:25:12 am Dirk Kostrewa wrote: so, wouldn't be the deposition of the final model's Fcalc, Phic (and their weights) along with the final coordinates be the best solution? The final Fcalc are our best model and can be used to reproduce the final statistics (which would remove the sfcheck annoyance) and to reproduce the final electron density maps, and the coordinates can be used for what ever purpose they are needed, irrespective of adding riding hydrogens or not. Now I'm confused. Isn't that already the recommended, if not required, practice? Ethan -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Hi Dirk, so, wouldn't be the deposition of the final model's Fcalc, Phic (and their weights) along with the final coordinates be the best solution? The final Fcalc are our best model and can be used to reproduce the final statistics (which would remove the sfcheck annoyance) and to reproduce the final electron density maps, and the coordinates can be used for what ever purpose they are needed, irrespective of adding riding hydrogens or not. it is a great idea and if you look in PDB deposited structure factors there is a number of them (but certainly not the majority) that are accompanied by Fcalc. However, a few things to keep in mind: - Imagine a (not very uncommon, unfortunately) situation when someone obtains the final model and Fcalc, and then, right before the PDB deposition does a final check in Coot, and moves/removes a few atoms (a few waters, or instance) here and there. Or may be does a real-space fit of a residue. Or removes H, if present. Or renames a ligand by request of PDB staff and accidentally change an atom parameter(s). All this in turn will invalidate the R-factors and make previously calculated Fcalc inconsistent with such a manipulated model. So, the bottom-line is: having a model that you can use to reproduce the reported statistics is important (for validation and database sanity at least, if someones believe that such a minor things wouldn't impair the biological interpretation - ultimate goal of protein structures). - To reproduce typically the most used electron density maps, such as 2mFo-DFc and mFo-DFc, you would also need to deposit coefficients m and D, or, alternatively, have a program and free-R flags handy to compute m and D yourself. - Requiring Fcalc, you would have to make sure that this is actually the total structure factors Fmodel = scales*(Fcalc_atoms + F_bulk_solvent) with all other appropriate scales included. Although, this is easy to do by computing the R-factor and comparing it with the reported number. All the best! Pavel.
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Ethan wrote I believe that deposition of Fc Phic FOM should be required. Certainly it should be the recommended practice. For the same series of structures I just deposited, which started the the riding H discussion, my mtz file had Fc Phic FOM + other data put out by Phenix - pavel can elaborate. rcsb stripped almost all of this and the processed file has only: HKL, Flag, Fc, SigmaF and FOC :{ What's a structural biologist to do? -- Mark
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Hi Mark, I assume you deposited the mtz? This is what Ethan was referring to - the pdb does not do well with maintaining all the relevant columns when submitting the mtz file. However, if you convert your mtz to cif yourself and make sure it has all the columns you would like to include and then submit this cif file to the pdb, all the information is retained. Eric __ Eric Larson, PhD Biomolecular Structure Center Department of Biochemistry Box 357742 University of Washington Seattle, WA 98195 On Thu, 16 Sep 2010, Dr. Mark Mayer wrote: Ethan wrote I believe that deposition of Fc Phic FOM should be required. Certainly it should be the recommended practice. For the same series of structures I just deposited, which started the the riding H discussion, my mtz file had Fc Phic FOM + other data put out by Phenix - pavel can elaborate. rcsb stripped almost all of this and the processed file has only: HKL, Flag, Fc, SigmaF and FOC :{ What's a structural biologist to do? -- Mark
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Thursday 16 September 2010 09:56:14 am Dr. Mark Mayer wrote: Ethan wrote I believe that deposition of Fc Phic FOM should be required. Certainly it should be the recommended practice. For the same series of structures I just deposited, which started the the riding H discussion, my mtz file had Fc Phic FOM + other data put out by Phenix - pavel can elaborate. rcsb stripped almost all of this and the processed file has only: HKL, Flag, Fc, SigmaF and FOC :{ Huh? That's not a cif fragment. What file are you looking at? In my experience the PDB feeds back to you a cif format structure factor file with a name like rcsb054058-sf.cif Near the top of that file you should find a description of the data columns. The columns present depend on what you fed it, of course. loop_ _refln.crystal_id _refln.wavelength_id _refln.scale_group_code _refln.status _refln.index_h _refln.index_k _refln.index_l _refln.F_meas_au _refln.F_meas_sigma_au _refln.intensity_meas _refln.intensity_sigma _refln.F_calc _refln.fom _refln.phase_meas Caveat: I have never tried to deposit a structure factor file from phenix; maybe that triggers some other processing pathway. Does anyone here know? I would say that the simple, and almost guaranteed to work, procedure is to do the cif conversion yourself and deposit the cif file. I noted in another message that the auto-conversion script on the PDB deposition site has a tendency to lose columns. That's why it is better to do the conversion yourself. I can't say that they _never_ lose columns in an uploaded cif file. I have had that happen, but only once and quite a while ago. What's a structural biologist to do? The empiricist's approach. Experiment till you find a procedure that works, then stick to it :-) -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Thu, Sep 16, 2010 at 10:19:14AM -0700, Ethan Merritt wrote: [...] What's a structural biologist to do? The empiricist's approach. Experiment till you find a procedure that works, then stick to it :-) ... or the social approach: communicate with the person at the PDB responsible for your deposition. So far that's work great for me (plaudit for the people at the PDB(e)). Tim -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742 -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A signature.asc Description: Digital signature
[ccp4bb] Deposition of riding H: R-factor is overrated
Huh? That's not a cif fragment. What file are you looking at? In my experience the PDB feeds back to you a cif format structure factor file with a name like rcsb054058-sf.cif Near the top of that file you should find a description of the data columns. The columns present depend on what you fed it, of course. Come on guys - give me a break ... all I posted was just a list of the columns in the sf file - here's a cut and paste of what rcsb actually generated rcsb061284-sf.cif data_r3om0sf # _audit.revision_id 1_0 _audit.creation_date ? _audit.update_record'Initial release' loop_ _refln.wavelength_id _refln.crystal_id _refln.scale_group_code _refln.index_h _refln.index_k _refln.index_l _refln.status _refln.F_meas_au _refln.F_meas_sigma_au _refln.fom 1 1 1 008 o 203.06.3 0.99 1 1 1 00 10 o 281.58.7 0.86 Below is mtzdmp of what I actually deposited (as MTZ) Col SortMinMaxNum % Mean Mean Resolution Type Column num order Missing complete abs. LowHigh label 1 ASC 0 46 0 100.00 17.7 17.7 31.88 1.40 H H 2 NONE 0 72 0 100.00 27.4 27.4 31.88 1.40 H K 3 NONE 0 81 0 100.00 30.5 30.5 31.88 1.40 H L 4 NONE3.3 2160.3 0 100.00 162.89 162.89 31.88 1.40 F FOBS 5 NONE0.960.0 0 100.00 5.36 5.36 31.88 1.40 Q SIGFOBS 6 NONE0.0 1.0 0 100.00 0.05 0.05 31.88 1.40 I R_FREE_FLAGS 7 NONE0.1 2253.6 0 100.00 157.73 157.73 31.88 1.40 F FMODEL 8 NONE -180.0 180.0 0 100.00 2.6590.13 31.88 1.40 P PHIFMODEL 9 NONE0.0 5823.1 0 100.00 219.29 219.29 31.88 1.40 F FCALC 10 NONE -180.0 180.0 0 100.00 3.2490.09 31.88 1.40 P PHIFCALC 11 NONE0.0 15330.0 0 100.00 141.04 141.04 31.88 1.40 F FMASK 12 NONE -180.0 180.0 0 100.00 4.2990.74 31.88 1.40 P PHIFMASK 13 NONE0.0 6909.4 0 100.0015.4215.42 31.88 1.40 F FBULK 14 NONE -180.0 180.0 0 100.00 4.2990.74 31.88 1.40 P PHIFBULK 15 NONE 0.803 1.199 0 100.001.0041.004 31.88 1.40 W FB_CART 16 NONE 0.001 1.000 0 100.000.8770.877 31.88 1.40 W FOM 17 NONE 0.576 0.754 0 100.000.7050.705 31.88 1.40 W ALPHA 18 NONE277.388 0 100.00 5655.391 5655.391 31.88 1.40 W BETA -- Mark
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Thursday 16 September 2010 10:34:14 am Dr. Mark Mayer wrote: Huh? That's not a cif fragment. What file are you looking at? In my experience the PDB feeds back to you a cif format structure factor file with a name like rcsb054058-sf.cif Near the top of that file you should find a description of the data columns. The columns present depend on what you fed it, of course. Come on guys - give me a break ... all I posted was just a list of the columns in the sf file I sincerely apologize. Believe it or not, I mistook your emoticon for part of a file syntax that I was not familiar with. HKL, Flag, Fc, SigmaF and FOC :{ I thought that colon + curly bracket was some funky data delimiter. Ethan -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] Deposition of riding H
Hello, a few points to balance the discussion: - if you refined your structure without H then it's obviously ok to deposit it without H; - if you refined your structure with H, then you should deposit it with H, as your refinement software outputs it (if your software uses H but removes it automatically for you - then it least it's not your responsibility). Any post-refinement manipulation on final refined model is bad since it tends to invalidate the reported statistics (R-factors, for instance), which is illustrated in this paper (section 3.1.5: J. Appl. Cryst. (2010). 43, 669-676). Indeed, let's not add more inconsistencies to the database because of a fear that insufficiently trained people may misinterpret it. - removing H, if really needed, is a matter of one trivial command, but adding them back the exact same way they were originally is less straightforward. - I agree that the X-H distances used in refinement and in validation are slightly different, although I'm not sure how much of difference that would make for validation. Pavel. On 9/14/10 10:38 PM, Ed Pozharski wrote: Mark, On Tue, 2010-09-14 at 13:34 -0400, Dr. Mark Mayer wrote: Where does the crystallographic community stand on deposition of coordinates with riding hydrogens? Surely community is divided on this. There could be arguments made both ways. Personally, I think that riding hydrogens can be calculated if necessary using the same algorithms/parameters employed upon refinement. It is true that different programs may use different parameter sets and reproducing exactly the same set of riding hydrogens may be difficult without exact knowledge of which version was used and ability to unearth that old version of the software. This may preclude one from getting exactly the same riding hydrogen positions (how large that difference would be I honestly don't know). But really, who cares? What is the benefit of knowing exactly where this or that riding hydrogen was? Maybe there is some benefit of such comparison in method development, but I would think its rather limited. I wholeheartedly agree with Ethan (even though that is not strictly what he said :) that some minor benefit here is completely negated by the danger of perception that somehow models tell us where hydrogens are. It is bad enough that, in my estimate, roughly 10% of atomic coordinates in the PDB are unwarranted as they come from disordered residues with exact spatial positions unsupported by electron density. Let's not add more things that PDB users may over-interpret. Cheers, Ed.
Re: [ccp4bb] Deposition of riding H
Dear Pavel, On Wed, Sep 15, 2010 at 07:57:09AM -0700, Pavel Afonine wrote: Hello, a few points to balance the discussion: Your points sound more sound more like a summary than a contribution to the discussion which might confuse inexperienced readers of this thread, especially if they did not follow it completely. So here me counter-balance ;-) - if you refined your structure without H then it's obviously ok to deposit it without H; I do not disagree but would like to add that I believe riding atoms usually improve the refinement even at poor resolution, so one should not refine without them at the final stage in the first place. - if you refined your structure with H, then you should deposit it with H, as your refinement software outputs it (if your software uses H but removes it automatically for you - then it least it's not your responsibility). Any post-refinement manipulation on final refined model is bad since it tends to invalidate the reported statistics (R-factors, for instance), which is illustrated in this paper (section 3.1.5: J. Appl. Cryst. (2010). 43, 669-676). Indeed, let's not add more inconsistencies to the database because of a fear that insufficiently trained people may misinterpret it. I disagree, because since the H-atom are (usually) in a riding position and used e.g. for anti-bumping restraints, they should be considered as (software dependent, as George pointed out) restraints rather than the actual model in terms of coordinates. - removing H, if really needed, is a matter of one trivial command, but adding them back the exact same way they were originally is less straightforward. I despise the word 'trivial' and as much as there is a 'useless use of cat' there is probably also an 'unnecessary use of trivial'. Cheers, Tim - I agree that the X-H distances used in refinement and in validation are slightly different, although I'm not sure how much of difference that would make for validation. Pavel. On 9/14/10 10:38 PM, Ed Pozharski wrote: Mark, On Tue, 2010-09-14 at 13:34 -0400, Dr. Mark Mayer wrote: Where does the crystallographic community stand on deposition of coordinates with riding hydrogens? Surely community is divided on this. There could be arguments made both ways. Personally, I think that riding hydrogens can be calculated if necessary using the same algorithms/parameters employed upon refinement. It is true that different programs may use different parameter sets and reproducing exactly the same set of riding hydrogens may be difficult without exact knowledge of which version was used and ability to unearth that old version of the software. This may preclude one from getting exactly the same riding hydrogen positions (how large that difference would be I honestly don't know). But really, who cares? What is the benefit of knowing exactly where this or that riding hydrogen was? Maybe there is some benefit of such comparison in method development, but I would think its rather limited. I wholeheartedly agree with Ethan (even though that is not strictly what he said :) that some minor benefit here is completely negated by the danger of perception that somehow models tell us where hydrogens are. It is bad enough that, in my estimate, roughly 10% of atomic coordinates in the PDB are unwarranted as they come from disordered residues with exact spatial positions unsupported by electron density. Let's not add more things that PDB users may over-interpret. Cheers, Ed. -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A signature.asc Description: Digital signature
Re: [ccp4bb] Deposition of riding H
While I am sympathetic to Ethan's and George's arguments, what is missing in the world as it stands is a section in PDB files that encode the parameters and rules used to generate the riding hydrogen atoms for that particular model. George has his favorite hydrogen atoms to build, his favorite bond lengths for placing them (and good arguments for his selections) and one could, I suppose, look them up in the documentation for Shelxl, but they should be encoded in the PDB file to allow automatic regeneration of the hydrogen atoms. An explicit listing of the rules for generation is particularly needed since all these matters can, and often are, modified by the user. I know that in my refinements I manually move the hydrogen from one nitrogen to the other in a couple Histidine side chains, and have created my own rules for hydrogen generation in co-factors. CIF tags will have to be agreed upon (and that's always a fun job) that would allow the description of the details of the various hydrogen atom generation schemes that are in use, or may be used in the future. It would also be handy to have a reference implementation, available under some forgiving license, that would materialize the hydrogen atoms given the PDB header information, and would reproduce the exact model refined, for any of the refinement programs. This is a worthwhile goal, but a tall order. Until this infrastructure is in place I think the hydrogen atoms have to be included in the PDB file. Otherwise it's the same as saying that I've refined TLS ADP's but not saying what the TLS parameters were nor listing the atoms in each TLS group. Dale Tronrud P.S. George: Do you think hydrogen atoms generated by the HFIX 137 command should be deposited? They are placed based on the electron density map with the dihedral angle of the methyl group becoming a parameter of the model -- a parameter not recorded anywhere other than in the hydrogen atom locations. On 09/14/10 12:41, George M. Sheldrick wrote: Even though SHELXL refinements often involve resolutions of 1.5A or better, I discourage SHELXL users from depositing their hydrogen coordinates. There are three reasons: 1. The C-H, N-H and O-H distances required to give the best fit to the electron density are significantly shorter than those required for molecular modeling and tests on non-bonded interactions (or located by neutron diffraction). It is ESSENTIAL to recalculate them hydrogens at longer distances before using MolProbity and other validation software. 2. There is considerable confusion concerning the names to be assigned to the hydrogens. This is not made easier by the application of a chirality test to -CH2- groups! 3. O-H hydrogens are particularly difficult to 'see' and the geometrical calculation of their positions is often ambiguous. The same applies to the protonation states of histidines and carboxylic acids. In addition such hydrogen positions are often disordered. For refinement I recommend including C-H and N-H but not O-H hydrogens. For very high resolution structures this reduces Rfree by 0.5-1.0% and clearly improves the model. At all resolutions the antibumping restraints involving hydrogens are useful. George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582 On Tue, 14 Sep 2010, Dr. Mark Mayer wrote: Here's one for the community, which I'll post to both Phenix and CCP4 BBs. Where does the crystallographic community stand on deposition of coordinates with riding hydrogens? Explicit H are required for calculating all atom clash scores with Molprobity, and their use frequently gives better geometry (especially at low resolution). Phenix uses explicit riding H for refinement, and outputs these in the refined PDB. Refmac also uses riding H but does not output H coordinates. While depositing a series of structures refined at 1.4 - 2.75 A with Phenix got the following email from the RCSB, who asked I resupply coordinates without H for two of the structures. Since we can't see H even at 1.4 Å I don't understand why an arbitrary cut off of 1.5 Å was chosen, and also why explicit H atoms used in refinement and geometry validation should be stripped from the file. FROM RCSB We encourage depositors not to use hydrogens in the final PDB file for the low resolution structures ( 1.5 A). Please provide an updated PDB file. We request you to use processed PDB file as a starting point for making any corrections to the coordinates and/or re-refinement. -- Mark
Re: [ccp4bb] Deposition of riding H
On Wednesday 15 September 2010, Pavel Afonine wrote: - if you refined your structure with H, then you should deposit it with H, as your refinement software outputs it As I see it, refining your structure in the presence of riding hydrogens is not the same thing as refining hydrogen positions in your structure. Let's exclude those rare cases of the latter from discussion. Tim Gruene wrote: since the H-atom are (usually) in a riding position and used e.g. for anti-bumping restraints, they should be considered as (software dependent, as George pointed out) restraints rather than the actual model in terms of coordinates. I agree. The use of a riding hydrogen model is better viewed as a refinement restraint than as a refinement of actual hydrogen positions. Ethan
Re: [ccp4bb] Deposition of riding H
Dear Dale, The PDB-format is, as far as I can see, incapable of containing all the information that you can store in an .ins-file used for shelxl refinement, so the only way one could recreate the model would be to also deposit the .ins-file anyhow, which solves the problem of the riding hydrogens altogether (and much more). The pdbe.org, for example allows to upload auxiliary files and in my opinion the uploading of the final .ins-file (not the .res-file!) should be made mandatory in the case of shelxl refinement. Since coot has now become utterly convenient even for shelxl refinement, there is no reason one should not deposit the .ins-file ([flame] and the PDB-file probably for legacy reasons [/flame]). Tim On Wed, Sep 15, 2010 at 09:14:51AM -0700, Dale Tronrud wrote: While I am sympathetic to Ethan's and George's arguments, what is missing in the world as it stands is a section in PDB files that encode the parameters and rules used to generate the riding hydrogen atoms for that particular model. George has his favorite hydrogen atoms to build, his favorite bond lengths for placing them (and good arguments for his selections) and one could, I suppose, look them up in the documentation for Shelxl, but they should be encoded in the PDB file to allow automatic regeneration of the hydrogen atoms. An explicit listing of the rules for generation is particularly needed since all these matters can, and often are, modified by the user. I know that in my refinements I manually move the hydrogen from one nitrogen to the other in a couple Histidine side chains, and have created my own rules for hydrogen generation in co-factors. CIF tags will have to be agreed upon (and that's always a fun job) that would allow the description of the details of the various hydrogen atom generation schemes that are in use, or may be used in the future. It would also be handy to have a reference implementation, available under some forgiving license, that would materialize the hydrogen atoms given the PDB header information, and would reproduce the exact model refined, for any of the refinement programs. This is a worthwhile goal, but a tall order. Until this infrastructure is in place I think the hydrogen atoms have to be included in the PDB file. Otherwise it's the same as saying that I've refined TLS ADP's but not saying what the TLS parameters were nor listing the atoms in each TLS group. Dale Tronrud P.S. George: Do you think hydrogen atoms generated by the HFIX 137 command should be deposited? They are placed based on the electron density map with the dihedral angle of the methyl group becoming a parameter of the model -- a parameter not recorded anywhere other than in the hydrogen atom locations. On 09/14/10 12:41, George M. Sheldrick wrote: Even though SHELXL refinements often involve resolutions of 1.5A or better, I discourage SHELXL users from depositing their hydrogen coordinates. There are three reasons: 1. The C-H, N-H and O-H distances required to give the best fit to the electron density are significantly shorter than those required for molecular modeling and tests on non-bonded interactions (or located by neutron diffraction). It is ESSENTIAL to recalculate them hydrogens at longer distances before using MolProbity and other validation software. 2. There is considerable confusion concerning the names to be assigned to the hydrogens. This is not made easier by the application of a chirality test to -CH2- groups! 3. O-H hydrogens are particularly difficult to 'see' and the geometrical calculation of their positions is often ambiguous. The same applies to the protonation states of histidines and carboxylic acids. In addition such hydrogen positions are often disordered. For refinement I recommend including C-H and N-H but not O-H hydrogens. For very high resolution structures this reduces Rfree by 0.5-1.0% and clearly improves the model. At all resolutions the antibumping restraints involving hydrogens are useful. George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582 On Tue, 14 Sep 2010, Dr. Mark Mayer wrote: Here's one for the community, which I'll post to both Phenix and CCP4 BBs. Where does the crystallographic community stand on deposition of coordinates with riding hydrogens? Explicit H are required for calculating all atom clash scores with Molprobity, and their use frequently gives better geometry (especially at low resolution). Phenix uses explicit riding H for refinement, and outputs these in the refined PDB. Refmac also uses riding H but does not output H coordinates. While depositing a series of structures refined at 1.4 - 2.75 A with Phenix got the following email from the RCSB, who
Re: [ccp4bb] Deposition of riding H
On Wed, 2010-09-15 at 07:57 -0700, Pavel Afonine wrote: if you refined your structure with H, then you should deposit it with H sure. But the structure is not *refined with hydrogens* when they are in predicted positions. Following the same logic one could suggest that electron density should be deposited, since we can approximate it. I think it's useful to limit the information presented in a pdb-file to what was actually refined + specific instructions on how the refinement was done. Any post-refinement manipulation on final refined model is bad since it tends to invalidate the reported statistics ... Indeed, let's not add more inconsistencies to the database because of a fear that insufficiently trained people may misinterpret it. I wouldn't call it a post-refinement manipulation, as nothing was really changed (afaiu, in most cases riding hydrogens are placed automatically by the program and not manipulated by user). On a digressing point, you might be underestimating the problem of misinterpretation by insufficiently trained people. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] Deposition of riding H
Hi Tim, The pdbe.org, for example allows to upload auxiliary files and in my opinion the uploading of the final .ins-file (not the .res-file!) should be made mandatory in the case of shelxl refinement. Since coot has now become utterly convenient even for shelxl refinement, there is no reason one should not deposit the .ins-file ([flame] and the PDB-file probably for legacy reasons [/flame]). I was always wondering but never had a good occasion to ask (my Shelxl knowledge is limited and may be outdated so I apology in advance if my questions are too dummy; also I realize that I'm asking a non-CCP4 question on CCP4bb for which I apology again): - how .ins file encodes the information about NCS groups used in refinement (atom selection for NCS groups, restraint weights for different groups, etc? - how .ins file encodes the information about TLS (again, atom selections for TLS groups, TLS matrices, etc)? Related, does it have a concept of having TLS and other components to the total atomic displacement parameter (ADP)? - If I recall it correctly, to fix (=not refine) a certain parameter (say occupancy or B-factor) in Shelxl you need to add a number 10 to it. Is it true? IMHO, this might lead to confusion if such a file gets deposited to PDB. All the best! Pavel.
Re: [ccp4bb] Deposition of riding H
Pavel, Shelxl is working in correct coordinates - fractional... Many things are easier in fractional coordinates. Are you sure that Phenix does not go orthogonal - fractional - orthogonal in internal calculations? When fixing of parameter is made in fractional coordinates it does not produce confusion. Shelxl also make fractional - orthogonal (AKA PDB) which is also correct. Constrain is not transferred there. BTW Shelxl knows symmetry very well and will constrain atoms that occupying symmetry elements. Shortly Shelxl knows crystallography best. When you will see number of lines in Shelxl Fortran code ( do not kill Fortran to early) you will be surprised. There are not so many of them. No graphical user interface yet, but COOT is of great help. Dr Felix Frolow Professor of Structural Biology and Biotechnology Department of Molecular Microbiology and Biotechnology Tel Aviv University 69978, Israel Acta Crystallographica F, co-editor e-mail: mbfro...@post.tau.ac.il Tel: ++972-3640-8723 Fax: ++972-3640-9407 Cellular: 0547 459 608 On Sep 15, 2010, at 19:11 , Pavel Afonine wrote: Hi Tim, The pdbe.org, for example allows to upload auxiliary files and in my opinion the uploading of the final .ins-file (not the .res-file!) should be made mandatory in the case of shelxl refinement. Since coot has now become utterly convenient even for shelxl refinement, there is no reason one should not deposit the .ins-file ([flame] and the PDB-file probably for legacy reasons [/flame]). I was always wondering but never had a good occasion to ask (my Shelxl knowledge is limited and may be outdated so I apology in advance if my questions are too dummy; also I realize that I'm asking a non-CCP4 question on CCP4bb for which I apology again): - how .ins file encodes the information about NCS groups used in refinement (atom selection for NCS groups, restraint weights for different groups, etc? - how .ins file encodes the information about TLS (again, atom selections for TLS groups, TLS matrices, etc)? Related, does it have a concept of having TLS and other components to the total atomic displacement parameter (ADP)? - If I recall it correctly, to fix (=not refine) a certain parameter (say occupancy or B-factor) in Shelxl you need to add a number 10 to it. Is it true? IMHO, this might lead to confusion if such a file gets deposited to PDB. All the best! Pavel.
Re: [ccp4bb] Deposition of riding H
On 15 Sep 2010, at 18:04, Ed Pozharski wrote: On Wed, 2010-09-15 at 07:57 -0700, Pavel Afonine wrote: if you refined your structure with H, then you should deposit it with H sure. But the structure is not *refined with hydrogens* when they are in predicted positions. Following the same logic one could suggest that electron density should be deposited, since we can approximate it. And I notice that a fair number of groups do deposit electron density - at least, they deposit PHIC and sometimes even HL coefficients in the sf.cif file. HL coefficients in the sf.cif file can get badly corrupted in the deposition process, but they definitely show willing. I think it's useful to limit the information presented in a pdb-file to what was actually refined + specific instructions on how the refinement was done. I suppose I come to this from a background where every deposition is a fresh new test-case for new refinement software; it's only lack of download bandwidth and CPU power that makes me not want to start from the images. I like the idea that what you deposit is the output of a well-defined refinement; which means that you need to deposit the instructions for doing the refinement, and the model you used as input. There's a perfectly good PDB protocol for multi-MODEL files. Nobody does such depositions, I think the PDB would complain if you tried, and there's the problem of endless regression. I would be very happy if every PDB deposition with 'METHOD: MOLECULAR REPLACEMENT' had an extra MODEL in it containing the input to the molrep tool, and some REMARK lines describing how molrep was used; I would not complain if this was made compulsory for depositions which nowadays say 'STARTING MODEL: NULL'. 26 of the 130 depositions with method MOLECULAR REPLACEMENT this week have starting model NULL, as well as seven depositions with method FOURIER SYNTHESIS and starting model NULL. (why do MAD and SAD depositions still have a STARTING MODEL field?) (while we're on the subject of riding hydrogens, I would invite people to admire the conformations of the hydrogens in such places as the C-alpha of residues A45 and A57 of deposition 2x5n - it's clearly a software bug rather than any mistake on the part of the authors, but nonetheless striking) Tom
Re: [ccp4bb] Deposition of riding H
Dear Felix, Shortly Shelxl knows crystallography best. I have no doubts about this. May questions were, though: - how .ins file encodes the information about NCS groups used in refinement (atom selection for NCS groups, restraint weights for different groups, etc? - how .ins file encodes the information about TLS (again, atom selections for TLS groups, TLS matrices, etc)? Related, does it have a concept of having TLS and other components to the total atomic displacement parameter (ADP)? - If I recall it correctly, to fix (=not refine) a certain parameter (say occupancy or B-factor) in Shelxl you need to add a number 10 to it. Is it true? IMHO, this might lead to confusion if such a file gets deposited to PDB. Best, Pavel.
Re: [ccp4bb] Deposition of riding H + what to deposit in addition to the pdb
On Wed, 2010-09-15 at 09:14 -0700, Dale Tronrud wrote: I know that in my refinements I manually move the hydrogen from one nitrogen to the other in a couple Histidine side chains, and have created my own rules for hydrogen generation in co-factors. Excellent point. And I believe in this case you are perfectly justified to either place a comment about this in the header or indeed deposit hydrogens. But I suspect that this is not what happens in most cases with, say, 2A refinement using refmac. The program is simply used to autogenerate the riding hydrogens, thus making the whole thing perfectly reproducible (with caveats). It may be seen as misleading when one deposits these hydrogens and they appear to have the same standing as other atoms which were actually refined. On a related issue, I believe it's long overdue policy change that all the input files, e.g. command scripts/cif-files for ligands/.ins files etc. should be attached to a PDB deposition. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] Deposition of riding H
Dear Ed, Any post-refinement manipulation on final refined model is bad since it tends to invalidate the reported statistics ... Indeed, let's not add more inconsistencies to the database because of a fear that insufficiently trained people may misinterpret it. I wouldn't call it a post-refinement manipulation, as nothing was really changed (afaiu, in most cases riding hydrogens are placed automatically by the program and not manipulated by user). On a digressing point, you might be underestimating the problem of misinterpretation by insufficiently trained people. ... This may preclude one from getting exactly the same riding hydrogen positions (how large that difference would be I honestly don't know). But really, who cares? I wouldn't dare calling a model manipulation that typically changes the R-factor by 0.5 ... ~2% as nothing. Although, you are may be right - who cares? I think the misinterpretation by insufficiently trained people problem should not be propagated to the database affecting the quality of depositing material. This is what I meant. Pavel.
Re: [ccp4bb] Deposition of riding H + what to deposit in addition to the pdb
The pdbe.org, for example allows to upload auxiliary files and in my opinion the uploading of the final .ins-file (not the .res-file!) should be made mandatory in the case of shelxl refinement. all the input files, e.g. command scripts/cif-files for ligands/.ins files etc. should be attached to a PDB deposition. Having access to all input files required to reproduce (modulo program/library version) the final/published refinement would be most helpful.
Re: [ccp4bb] Deposition of riding H
Dear Pavel, May I suggest that you take a look at the SHELX manual: http://shelx.uni-ac.gwdg.de/SHELX/shelx.pdf before sending your SHELX questions to CCP4bb? You might even find some good ideas for implementing in phenix_refine! For example, if you look up 'non-crystallographic symmetry' in the index you will discover that SHELXL applies NCS in the form of restraints, not constraints, which has the advantage that it can be applied locally and in combination with all other restraints and constraints involving the same atoms. However you will not find TLS in the index, because the credit for implementing this very useful concept should be given to Martin Winn, Garib and Ethan, long after the current version of SHELXL (and its manual) were released in 1997. And because SHELXL only reads one instruction file (*.ins) and one reflection file (*.hkl) but no other data files or libraries, and FORTRAN will always be FORTRAN, the deposition of these two files would be sufficient to define the refinement for posterity. Best wishes, George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582 On Wed, 15 Sep 2010, Pavel Afonine wrote: Dear Felix, Shortly Shelxl knows crystallography best. I have no doubts about this. May questions were, though: - how .ins file encodes the information about NCS groups used in refinement (atom selection for NCS groups, restraint weights for different groups, etc? - how .ins file encodes the information about TLS (again, atom selections for TLS groups, TLS matrices, etc)? Related, does it have a concept of having TLS and other components to the total atomic displacement parameter (ADP)? - If I recall it correctly, to fix (=not refine) a certain parameter (say occupancy or B-factor) in Shelxl you need to add a number 10 to it. Is it true? IMHO, this might lead to confusion if such a file gets deposited to PDB. Best, Pavel.
Re: [ccp4bb] Deposition of riding H
Dear George, a small correction if I may: However you will not find TLS in the index, because the credit for implementing this very useful concept should be given to Martin Winn, Garib and Ethan, long after the current version of SHELXL (and its manual) were released in 1997. Acta Cryst. (*1985*). A41, 426-433 Restrained structure-factor least-squares refinement of protein structures using a vector processing computer I. Haneef, D. S. Moss, M. J. Stanford and N. Borkakoti *Abstract:* A least-squares refinement program /RESTRAIN/ has been developed, which is capable of refining macromolecular structures using structure amplitudes, phases from isomorphous replacement or anomalous scattering and pseudo-energy restraints. In addition to positional parameters and isotropic temperature factors, anisotropic mean-square displacements may be refined either as individual atomic *U* tensors or as *TLS* tensors applied to groups of atoms. Anharmonic effects may be handled by coupling together occupancies to enable the electron density of an atomic group to be distributed over more than one subsite. A novel way of restraining groups of atoms to be planar has been developed that does not require dummy atoms and does not restrain the plane to lie in its current orientation. One can find other, earlier programs, but they are small molecule specific. Regards, Pavel.
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Wed, 2010-09-15 at 10:50 -0700, Pavel Afonine wrote: I wouldn't dare calling a model manipulation that typically changes the R-factor by 0.5 ... ~2% as nothing. Although, you are may be right - who cares? It's not a manipulation because no parameters were manipulated in the model. Don't you agree that using the riding model does not add additional refinable parameters? But your insistence has awakened my curiosity. So I looked at hydrogens as produced by phenix.refine for a 1.8A structure I randomly picked. Just as George has pointed out, the covalent bonds are too short. for instance, when hydrogens are added, the average N-H distance is 1.1(5), but upon refinement the value is down to 0.85998(4). I won't even begin discussing the fact that some of these hydrogens added to K,Y,S etc are placed in positions that are not justified by data (not in definitely wrong positions either, it's just that there is no evidence to support a particular torsion angle). And that it is unlikely that every histidine in the structure is fully protonated. Do you see the problem? I fully understand your desire to be able to reproduce the R-factors (although I don't necessarily share it), but if I decide to deposit this model with hydrogens, am I essentially stating that N-H bond is magically shortened to ~0.86A? Sure, it is driver's (PDB user's) responsibility to know the meaning of the red light (riding hydrogens), but wouldn't depositing riding hydrogens be equivalent to putting 70 mph sign at the ramp, just because all the cops know that it's not the actual safe speed? And then tell the accident victim that there was a fine print in the rule book? I think this situation is particularly problematic given that these days some enter the field the same way many people (at least so it seems here in Baltimore) get their driver's licenses, i.e. without ever learning the rules? Cheers, Ed. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Dear Ed, On 9/15/10 12:54 PM, Ed Pozharski wrote: On Wed, 2010-09-15 at 10:50 -0700, Pavel Afonine wrote: I wouldn't dare calling a model manipulation that typically changes the R-factor by 0.5 ... ~2% as nothing. Although, you are may be right - who cares? It's not a manipulation because no parameters were manipulated in the model. I can't agree with this, sorry. A change to a model content (especially the one that changes Fcalc) is a model manipulation. Pavel.
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On 9/15/10 3:54 PM, Ed Pozharski wrote: Don't you agree that using the riding model does not add additional refinable parameters? (snip) instance, when hydrogens are added, the average N-H distance is 1.1(5), but upon refinement the value is down to 0.85998(4). I So the riding hydrogen model is imperfect. At least with phenix.refine you can measure it, unlike the default behavior of REFMAC. (But you can tell it to write hydrogens out, I believe). Obviously this question is not one amenable to a simple answer. In some sense (as per George) riding hydrogens are merely a restraint. In some other sense they are fundamentally a part of the model - they have very directional properties via bumping restraints that most certainly alter the atomic model for the heavy atoms in a very direct way via collision. Since the nature of these atoms - locationally specific - differs from the more amorphous extended atom restraints (CH3E for methyl in CNS etc) it could make sense to include them in the model at deposition. As far as I know we do not delete atoms from the final model that contribute to scattering and geometric restraints under any other circumstances, except perhaps in the nearly-as-contentious how do I model my disordered side-chain case. Also not amenable to a simple answer. Both approaches (REFMAC-esque and PHENIX-esque) have their merits. I doubt I'm the only person here conflicted over what to do about it. However this thread appears to have reached the point where not much new ground is being broken. Phil Jeffrey Princeton
Re: [ccp4bb] Deposition of riding H [was: Deposition of riding H]
Dear Pavel, Stressing Ethan's point about TLS refinement becoming practical with Refmac implementation, Winn et al. (2001) Acta D, 57, 122-133 (I know you like references) states: Derivatives of the residual with respect to the TLS parameters are expanded in terms of the derivatives with respect to individual anisotropic U values, which in turn are calculated using a fast Fourier transform technique. TLS refinement is therefore fast and can be used routinely. Best wishes Roberto On 15 Sep 2010, at 19:34, Pavel Afonine wrote: Dear George, a small correction if I may: However you will not find TLS in the index, because the credit for implementing this very useful concept should be given to Martin Winn, Garib and Ethan, long after the current version of SHELXL (and its manual) were released in 1997. Acta Cryst. (1985). A41, 426-433 Restrained structure-factor least-squares refinement of protein structures using a vector processing computer I. Haneef, D. S. Moss, M. J. Stanford and N. Borkakoti Abstract: A least-squares refinement program RESTRAIN has been developed, which is capable of refining macromolecular structures using structure amplitudes, phases from isomorphousreplacement or anomalous scattering and pseudo-energy restraints. In addition to positional parameters and isotropic temperature factors, anisotropic mean-square displacements may be refined either as individual atomic U tensors or as TLS tensors applied to groups of atoms. Anharmonic effects may be handled by coupling together occupancies to enable the electron density of an atomic group to be distributed over more than one subsite. A novel way of restraining groups of atoms to be planar has been developed that does not require dummy atoms and does not restrain the plane to lie in its current orientation. One can find other, earlier programs, but they are small molecule specific. Regards, Pavel. --- Dr. Roberto Steiner Randall Division of Cell and Molecular Biophysics New Hunt's House King's College London Guy's Campus London, SE1 1UL Phone +44 (0)20-7848-8216 Fax +44 (0)20-7848-6435 e-mail roberto.stei...@kcl.ac.uk
Re: [ccp4bb] Deposition of riding H
However this thread appears to have reached the point where not much new ground is being broken. As the person who started this thread I'll second Phil Jeffrey's comment. I chose to continue my depositions with riding H, and the rcsb agreed to accept the coordinates. Its been great hearing the experts weigh in on this. I've learned a lot, and clearly there is no consensus. As one of the vast majority of crystallographers dependent on all the hard work that program developers undertake to support structural biology, I'm happy to follow advice given by the developers of the various programs I use, and for Phenix the current advice is to deposit with riding H. -- Mark
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Wed, 2010-09-15 at 13:13 -0700, Pavel Afonine wrote: I can't agree with this, sorry. A change to a model content (especially the one that changes Fcalc) is a model manipulation. That is not what I asked. Do you agree that using the riding model does not add additional refinable parameters? -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
I should just like to point out that the main source of the disagreement here seems to be that people have very different ideas about what a 'model' is or should be. Strictly a model is a purely mathematical construct, in this case it consists of the appropriate equation for the calculated structure factor and the best-fit values of the various parameters (scattering factors, atomic positions, occupancies, B factors, TLS parameters etc.) that appear in it. A mathematical model is inevitably going to be an imperfect representation of reality, but hopefully it's the best one we can come up with, in the sense of best explaining the data without significant overfitting. The problem arises because many users of the PDB, and I suspect many contributors to this BB, particularly non-crystallographers, don't see it like that, because they view a PDB file as a physical model, i.e. not as the best fit to the data (assuming that the non-crystallographers even know what the data are!), but the closest representation of reality. The difference between the N-H bond lengths that Ed referred to illustrates the distinction between the mathematical and the physical model. The mathematical model requires that the bond length is 0.86 Ang because that value gives the best fit of the assumed spherical scattering factor of H to the deformation density of the X-H covalent bond. The physical model requires that it be 1.00 Ang because that is the internuclear distance found by spectroscopic methods predicted by QM calculations. The same goes for B factors and TLS: to a large extent they are a mathematical construct whose purpose is to provide an optimal fit to the data. The connection of Bs TLS with reality is tenuous at best, nevertheless people obviously would like to have a physical interpretation such as rigid-body correlated motion. The fact that Bragg scattering provides no information about correlated motion (you need to measure the diffuse scattering for that) doesn't seem to deter them! I have no doubt in my mind that it is the mathematical model that should be published, because hopefully it's the best available interpretation of the data. Whether that involves publishing the riding H atoms explicitly, or alternatively the formulae and parameters that were used to calculate their positions I don't mind, as long as I can faithfully reproduce the Fcalcs to check the validity of the model. Then users of the PDB are free to *interpret* the mathematical models as physical models in a appropriate manner (e.g. by adjusting the bond lengths to H), and crystallographers have the untainted mathematical models needed to reproduce the Fcalcs. Cheers -- Ian On Wed, Sep 15, 2010 at 9:13 PM, Pavel Afonine pafon...@lbl.gov wrote: Dear Ed, On 9/15/10 12:54 PM, Ed Pozharski wrote: On Wed, 2010-09-15 at 10:50 -0700, Pavel Afonine wrote: I wouldn't dare calling a model manipulation that typically changes the R-factor by 0.5 ... ~2% as nothing. Although, you are may be right - who cares? It's not a manipulation because no parameters were manipulated in the model. I can't agree with this, sorry. A change to a model content (especially the one that changes Fcalc) is a model manipulation. Pavel.
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Wed, 2010-09-15 at 16:26 -0400, Phil Jeffrey wrote: So the riding hydrogen model is imperfect. At least with phenix.refine you can measure it, unlike the default behavior of REFMAC. (But you can tell it to write hydrogens out, I believe). My impression is that default behavior of phenix.refine is the same - I had to change parameters to include hydrogens in the output. Without breaking any new ground, there is really no conflict here. Is it a good idea to make a complete model description (including riding hydrogens, input files, cif-files, special case restraints etc) available for structures deposited in the PDB? Absolutely. But not in this form, when model is implying that we know the protonation states of all the atoms and has unreasonable geometry. For the example that I provided, the rmsd_bonds for that particular group is 0.14A, certainly unacceptable. Maybe one can use different record for these atoms, say RIDING instead of ATOM. Thus complete model can be recovered and at the same time the nature of these items is explicitly stated. In this way riding hydrogens are clearly distinguished from those that are actually refined at ultrahigh resolution. Cheers, Ed. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Dear Ed, On 9/15/10 2:47 PM, Ed Pozharski wrote: On Wed, 2010-09-15 at 16:26 -0400, Phil Jeffrey wrote: So the riding hydrogen model is imperfect. At least with phenix.refine you can measure it, unlike the default behavior of REFMAC. (But you can tell it to write hydrogens out, I believe). My impression is that default behavior of phenix.refine is the same - I had to change parameters to include hydrogens in the output. No, if your input file contains H atoms, the output file will contain them too (in phenix.refine). You don't have to change any parameters for this. Pavel.
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Sure. But if I start with model that has no hydrogens, they will be generated but not passed to the output, right. just like refmac. On Wed, 2010-09-15 at 14:52 -0700, Pavel Afonine wrote: Dear Ed, On 9/15/10 2:47 PM, Ed Pozharski wrote: On Wed, 2010-09-15 at 16:26 -0400, Phil Jeffrey wrote: So the riding hydrogen model is imperfect. At least with phenix.refine you can measure it, unlike the default behavior of REFMAC. (But you can tell it to write hydrogens out, I believe). My impression is that default behavior of phenix.refine is the same - I had to change parameters to include hydrogens in the output. No, if your input file contains H atoms, the output file will contain them too (in phenix.refine). You don't have to change any parameters for this. Pavel. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Dear Ed, no, if you start with model that has no hydrogens, they will not be generated internally. Pavel. On 9/15/10 2:58 PM, Ed Pozharski wrote: Sure. But if I start with model that has no hydrogens, they will be generated but not passed to the output, right. just like refmac. On Wed, 2010-09-15 at 14:52 -0700, Pavel Afonine wrote: Dear Ed, On 9/15/10 2:47 PM, Ed Pozharski wrote: On Wed, 2010-09-15 at 16:26 -0400, Phil Jeffrey wrote: So the riding hydrogen model is imperfect. At least with phenix.refine you can measure it, unlike the default behavior of REFMAC. (But you can tell it to write hydrogens out, I believe). My impression is that default behavior of phenix.refine is the same - I had to change parameters to include hydrogens in the output. No, if your input file contains H atoms, the output file will contain them too (in phenix.refine). You don't have to change any parameters for this. Pavel.
Re: [ccp4bb] Deposition of riding H
Pavel: In my original email I very carefully gave credit for IMPLEMENTING the TLS concept. Of course the ideas and some programs had been around long before, but it was the IMPLEMENTATION IN REFMAC that resulted in TLS becoming widely used. I had actually considered putting it into SHELXL but had not done so for two reasons (a) I was too lazy and (b) I missed an essential trick that REFMAC introduced, namely the combination of TLS with an additive isotropic B-value for each atom. Dale: You are quite correct that AFIX 137 breaks my argument about not depositing (SHELX) hydrogen atoms because they can be recalculated with no loss of experimental information. However to be fair, if you generate the first .ins file using SHELXPRO (the recommended procedure) you will get AFIX 33 that doesn't have this problem. For Pavel and others unfamiliar with SHELXL, AFIX 33 is a pure riding model with a staggered methyl group but AFIX 137 assumes local threefold symmetry, finds the initial torsion angle by a three-fold averaged fit to the difference density and then refines the torsion angle in the following cycles. Since this torsion angle is not given explicitly in the output files, if AFIX 137 hydrogens are not deposited, they cannot be regenerated except by a full refinement against the experimental data. George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582 On Wed, 15 Sep 2010, George M. Sheldrick wrote: Dear Pavel, May I suggest that you take a look at the SHELX manual: http://shelx.uni-ac.gwdg.de/SHELX/shelx.pdf before sending your SHELX questions to CCP4bb? You might even find some good ideas for implementing in phenix_refine! For example, if you look up 'non-crystallographic symmetry' in the index you will discover that SHELXL applies NCS in the form of restraints, not constraints, which has the advantage that it can be applied locally and in combination with all other restraints and constraints involving the same atoms. However you will not find TLS in the index, because the credit for implementing this very useful concept should be given to Martin Winn, Garib and Ethan, long after the current version of SHELXL (and its manual) were released in 1997. And because SHELXL only reads one instruction file (*.ins) and one reflection file (*.hkl) but no other data files or libraries, and FORTRAN will always be FORTRAN, the deposition of these two files would be sufficient to define the refinement for posterity. Best wishes, George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582 On Wed, 15 Sep 2010, Pavel Afonine wrote: Dear Felix, Shortly Shelxl knows crystallography best. I have no doubts about this. May questions were, though: - how .ins file encodes the information about NCS groups used in refinement (atom selection for NCS groups, restraint weights for different groups, etc? - how .ins file encodes the information about TLS (again, atom selections for TLS groups, TLS matrices, etc)? Related, does it have a concept of having TLS and other components to the total atomic displacement parameter (ADP)? - If I recall it correctly, to fix (=not refine) a certain parameter (say occupancy or B-factor) in Shelxl you need to add a number 10 to it. Is it true? IMHO, this might lead to confusion if such a file gets deposited to PDB. Best, Pavel.
Re: [ccp4bb] Deposition of riding H
Hi All, I have not read all messages in the trace so my apologies if somebody already pointed out what I have to say. There is lot of talk about how this or that software treats the riding hydrogens. What to do with the fact that however these hydrogens are treated in calculations, they are, in most cases, still assumed? Meents et al http://scripts.iucr.org/cgi-bin/paper?xh0004 showed that proteins are stripped of hydrogens during X-ray data collection. So, IMHO it is a good argument against depositing the H coordinates in PDB. Cheers, N. Ruslan Sanishvili (Nukri), Ph.D. GM/CA-CAT Biosciences Division, ANL 9700 S. Cass Ave. Argonne, IL 60439 Tel: (630)252-0665 Fax: (630)252-0667 rsanishv...@anl.gov -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of George M. Sheldrick Sent: Wednesday, September 15, 2010 5:14 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Deposition of riding H Pavel: In my original email I very carefully gave credit for IMPLEMENTING the TLS concept. Of course the ideas and some programs had been around long before, but it was the IMPLEMENTATION IN REFMAC that resulted in TLS becoming widely used. I had actually considered putting it into SHELXL but had not done so for two reasons (a) I was too lazy and (b) I missed an essential trick that REFMAC introduced, namely the combination of TLS with an additive isotropic B-value for each atom. Dale: You are quite correct that AFIX 137 breaks my argument about not depositing (SHELX) hydrogen atoms because they can be recalculated with no loss of experimental information. However to be fair, if you generate the first .ins file using SHELXPRO (the recommended procedure) you will get AFIX 33 that doesn't have this problem. For Pavel and others unfamiliar with SHELXL, AFIX 33 is a pure riding model with a staggered methyl group but AFIX 137 assumes local threefold symmetry, finds the initial torsion angle by a three-fold averaged fit to the difference density and then refines the torsion angle in the following cycles. Since this torsion angle is not given explicitly in the output files, if AFIX 137 hydrogens are not deposited, they cannot be regenerated except by a full refinement against the experimental data. George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582 On Wed, 15 Sep 2010, George M. Sheldrick wrote: Dear Pavel, May I suggest that you take a look at the SHELX manual: http://shelx.uni-ac.gwdg.de/SHELX/shelx.pdf before sending your SHELX questions to CCP4bb? You might even find some good ideas for implementing in phenix_refine! For example, if you look up 'non-crystallographic symmetry' in the index you will discover that SHELXL applies NCS in the form of restraints, not constraints, which has the advantage that it can be applied locally and in combination with all other restraints and constraints involving the same atoms. However you will not find TLS in the index, because the credit for implementing this very useful concept should be given to Martin Winn, Garib and Ethan, long after the current version of SHELXL (and its manual) were released in 1997. And because SHELXL only reads one instruction file (*.ins) and one reflection file (*.hkl) but no other data files or libraries, and FORTRAN will always be FORTRAN, the deposition of these two files would be sufficient to define the refinement for posterity. Best wishes, George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582 On Wed, 15 Sep 2010, Pavel Afonine wrote: Dear Felix, Shortly Shelxl knows crystallography best. I have no doubts about this. May questions were, though: - how .ins file encodes the information about NCS groups used in refinement (atom selection for NCS groups, restraint weights for different groups, etc? - how .ins file encodes the information about TLS (again, atom selections for TLS groups, TLS matrices, etc)? Related, does it have a concept of having TLS and other components to the total atomic displacement parameter (ADP)? - If I recall it correctly, to fix (=not refine) a certain parameter (say occupancy or B-factor) in Shelxl you need to add a number 10 to it. Is it true? IMHO, this might lead to confusion if such a file gets deposited to PDB. Best, Pavel.
[ccp4bb] Deposition of riding H
Here's one for the community, which I'll post to both Phenix and CCP4 BBs. Where does the crystallographic community stand on deposition of coordinates with riding hydrogens? Explicit H are required for calculating all atom clash scores with Molprobity, and their use frequently gives better geometry (especially at low resolution). Phenix uses explicit riding H for refinement, and outputs these in the refined PDB. Refmac also uses riding H but does not output H coordinates. While depositing a series of structures refined at 1.4 - 2.75 A with Phenix got the following email from the RCSB, who asked I resupply coordinates without H for two of the structures. Since we can't see H even at 1.4 Å I don't understand why an arbitrary cut off of 1.5 Å was chosen, and also why explicit H atoms used in refinement and geometry validation should be stripped from the file. FROM RCSB We encourage depositors not to use hydrogens in the final PDB file for the low resolution structures ( 1.5 A). Please provide an updated PDB file. We request you to use processed PDB file as a starting point for making any corrections to the coordinates and/or re-refinement. -- Mark
Re: [ccp4bb] Deposition of riding H
On Tuesday 14 September 2010 10:34:00 am Dr. Mark Mayer wrote: Here's one for the community, which I'll post to both Phenix and CCP4 BBs. Where does the crystallographic community stand on deposition of coordinates with riding hydrogens? I do not favor depositing riding hydrogen coordinates for the same reason that I do not like the recent PDB preference for depositing ANISOU records for structures that have been refined with TLS. In both cases the enumeration of these many thousands of parameter values gives the strong, but false, impression that they have been individually modeled. They have not. There are really only a dozen or so parameters in the riding hydrogen model. All those coordinates follow directly from application of this same small set of values. Similarly, there are really only 20 parameters per TLS group in your model, no matter how many atoms you applied it to. There is IMHO no justification for presenting the resulting model in a form that makes it appear that 6 additional parameters per atom have been modeled, when in fact that number is either 0 or 1. Ethan Explicit H are required for calculating all atom clash scores with Molprobity, and their use frequently gives better geometry (especially at low resolution). Phenix uses explicit riding H for refinement, and outputs these in the refined PDB. Refmac also uses riding H but does not output H coordinates. While depositing a series of structures refined at 1.4 - 2.75 A with Phenix got the following email from the RCSB, who asked I resupply coordinates without H for two of the structures. Since we can't see H even at 1.4 Å I don't understand why an arbitrary cut off of 1.5 Å was chosen, and also why explicit H atoms used in refinement and geometry validation should be stripped from the file. FROM RCSB We encourage depositors not to use hydrogens in the final PDB file for the low resolution structures ( 1.5 A). Please provide an updated PDB file. We request you to use processed PDB file as a starting point for making any corrections to the coordinates and/or re-refinement. -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] Deposition of riding H
Even though SHELXL refinements often involve resolutions of 1.5A or better, I discourage SHELXL users from depositing their hydrogen coordinates. There are three reasons: 1. The C-H, N-H and O-H distances required to give the best fit to the electron density are significantly shorter than those required for molecular modeling and tests on non-bonded interactions (or located by neutron diffraction). It is ESSENTIAL to recalculate them hydrogens at longer distances before using MolProbity and other validation software. 2. There is considerable confusion concerning the names to be assigned to the hydrogens. This is not made easier by the application of a chirality test to -CH2- groups! 3. O-H hydrogens are particularly difficult to 'see' and the geometrical calculation of their positions is often ambiguous. The same applies to the protonation states of histidines and carboxylic acids. In addition such hydrogen positions are often disordered. For refinement I recommend including C-H and N-H but not O-H hydrogens. For very high resolution structures this reduces Rfree by 0.5-1.0% and clearly improves the model. At all resolutions the antibumping restraints involving hydrogens are useful. George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582 On Tue, 14 Sep 2010, Dr. Mark Mayer wrote: Here's one for the community, which I'll post to both Phenix and CCP4 BBs. Where does the crystallographic community stand on deposition of coordinates with riding hydrogens? Explicit H are required for calculating all atom clash scores with Molprobity, and their use frequently gives better geometry (especially at low resolution). Phenix uses explicit riding H for refinement, and outputs these in the refined PDB. Refmac also uses riding H but does not output H coordinates. While depositing a series of structures refined at 1.4 - 2.75 A with Phenix got the following email from the RCSB, who asked I resupply coordinates without H for two of the structures. Since we can't see H even at 1.4 Å I don't understand why an arbitrary cut off of 1.5 Å was chosen, and also why explicit H atoms used in refinement and geometry validation should be stripped from the file. FROM RCSB We encourage depositors not to use hydrogens in the final PDB file for the low resolution structures ( 1.5 A). Please provide an updated PDB file. We request you to use processed PDB file as a starting point for making any corrections to the coordinates and/or re-refinement. -- Mark
Re: [ccp4bb] Deposition of riding H
Hi Ethan, I do not favor depositing riding hydrogen coordinates for the same reason that I do not like the recent PDB preference for depositing ANISOU records for structures that have been refined with TLS. In both cases the enumeration of these many thousands of parameter values gives the strong, but false, impression that they have been individually modeled. They have not. following this logic one could say that the individual x,y,z coordinates listed in ATOM records for a structure refined at very low resolution using rigid-body refinement only (or torsion angle Simulated Annealing only) also may make a false impression that these coordinates were refined individually. Pavel.
Re: [ccp4bb] Deposition of riding H
On Tuesday 14 September 2010 12:44:37 pm Pavel Afonine wrote: Hi Ethan, I do not favor depositing riding hydrogen coordinates for the same reason that I do not like the recent PDB preference for depositing ANISOU records for structures that have been refined with TLS. In both cases the enumeration of these many thousands of parameter values gives the strong, but false, impression that they have been individually modeled. They have not. following this logic one could say that the individual x,y,z coordinates listed in ATOM records for a structure refined at very low resolution using rigid-body refinement only (or torsion angle Simulated Annealing only) also may make a false impression that these coordinates were refined individually. I agree with this, at least for the case of true rigid-body. But you would still need to describe somehow the coordinates of all the atoms in your rigid model. If it came straight out of the PDB, then in principle it would suffice to give the PDB+CHAIN code and the rotation/translate matrix. But if any adjustments were made, which is I think typical if only to correct for sequence differences,then as a practical matter you still need to provide the true starting coordinates. And at that point you might as well provide the ending coordinates instead, since it's the same amount of information. Ethan -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] Deposition of riding H
Hi Ethan, yes, you are absolutely right, you would need to define somehow where your model is... But you could, at least hypothetically, use non-atomic models for this! Like cylinders (*), spheres and similar shapes. This is what the density looks like at those super-low resolutions. (*) BOROVIKOV, B. A., VAINSTEIN, B. K., GELFAND, I. M. KALININ, D. I. (1979). Kristallografiya, 24, 227-238. KALININ, D. I. (1980). Kristallografiya, 25, 535-544. V. Yu. Lunin, A. G. Urzhumtsev, E. A. Vernoslova, Yu. N. Chirgadze, N. A. Neveskaya and N. P. Fomenkova. Acta Cryst. (1985). A41, 166-171. Anyway, looks like we are about to diverge from the original subject so I stop rambling -:) All the best! Pavel. On 9/14/10 1:10 PM, Ethan Merritt wrote: On Tuesday 14 September 2010 12:44:37 pm Pavel Afonine wrote: Hi Ethan, I do not favor depositing riding hydrogen coordinates for the same reason that I do not like the recent PDB preference for depositing ANISOU records for structures that have been refined with TLS. In both cases the enumeration of these many thousands of parameter values gives the strong, but false, impression that they have been individually modeled. They have not. following this logic one could say that the individual x,y,z coordinates listed in ATOM records for a structure refined at very low resolution using rigid-body refinement only (or torsion angle Simulated Annealing only) also may make a false impression that these coordinates were refined individually. I agree with this, at least for the case of true rigid-body. But you would still need to describe somehow the coordinates of all the atoms in your rigid model. If it came straight out of the PDB, then in principle it would suffice to give the PDB+CHAIN code and the rotation/translate matrix. But if any adjustments were made, which is I think typical if only to correct for sequence differences,then as a practical matter you still need to provide the true starting coordinates. And at that point you might as well provide the ending coordinates instead, since it's the same amount of information. Ethan
Re: [ccp4bb] Deposition of riding H
Mark, On Tue, 2010-09-14 at 13:34 -0400, Dr. Mark Mayer wrote: Where does the crystallographic community stand on deposition of coordinates with riding hydrogens? Surely community is divided on this. There could be arguments made both ways. Personally, I think that riding hydrogens can be calculated if necessary using the same algorithms/parameters employed upon refinement. It is true that different programs may use different parameter sets and reproducing exactly the same set of riding hydrogens may be difficult without exact knowledge of which version was used and ability to unearth that old version of the software. This may preclude one from getting exactly the same riding hydrogen positions (how large that difference would be I honestly don't know). But really, who cares? What is the benefit of knowing exactly where this or that riding hydrogen was? Maybe there is some benefit of such comparison in method development, but I would think its rather limited. I wholeheartedly agree with Ethan (even though that is not strictly what he said :) that some minor benefit here is completely negated by the danger of perception that somehow models tell us where hydrogens are. It is bad enough that, in my estimate, roughly 10% of atomic coordinates in the PDB are unwarranted as they come from disordered residues with exact spatial positions unsupported by electron density. Let's not add more things that PDB users may over-interpret. Cheers, Ed.