Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Hi Nicholas, Thank you for your reply. snip it seems that we are trying to deposit one model to satisfy two different purposes - one for model validation and the other for model interpretation (use in docking etc), and what's good for one purpose might not be necessarily good for the other. /snip This has been discussed before on this list, but allow me to repeat it: You would have expected that the crystallographers' aim would be to deposit the model that maximises the product (likelihood * prior). Clearly, this is not what we do, mainly because (a) the calculation of likelihood is only based on a subset of the 'data' that are obtained from an X-ray diffraction experiment (for example, we ignore diffuse scattering as Ian pointed-out), (b) we consciously avoid 'prior' because this would make the models 'subjective', meaning that better informed people would deposit (for the same data) different models than the less well informed, (c) the format of the PDB does not offer much room for 'creative interpretations' of the electron density maps [for example, you can't have discrete disorder on the backbone (or has this changed ?)]. I sense that what is being deposited is not the 'best model' in any conceivable way, but the model that 'best' accounts for the final 2mFo-DFc map within the limitations of the program used for the final refinement. I don't quite understand your point. We currently deposit electron densities and movies, I don't see how depositing an energy minimized structure is so difficult. It doesn't need to be on the same pdb file as the model used in refinement nor does it need to be deposited into the PDB server, but even if it does, is it not possible to have it as a new Chain or new atom type in the current pdb file format? ps. May I say parenthetically that making the deposited models dependant on their intended usage, would possibly qualify as 'fraud' ;-) I don't quite understand this either. When I prepare a protein model for simulation, I would remove all alternative conformations, add hydrogens, and then minimize the structure. If I make such a minimized structure available for others to use with full disclosure, how would that constitute fraud? I was going to start offering minimized models on our future structures on our lab website, but if that constitutes fraud, then I might have to rethink. I don't know enough to argue with anyone here and that's not the intention of my posts - I am just trying to help figure out a way to resolve a significant problem that will likely to resurface down the road. It would be helpful if the more experienced people here can start a discussion of 'how to resolve' the problems exposed by this thread so far - assuming that you agree that it's a problem worth your time. Cheers, Quyen __ Quyen Hoang, Ph.D Assistant Professor Department of Biochemistry and Molecular Biology, Stark Neurosciences Research Institute Indiana University School of Medicine 635 Barnhill Drive, Room MS0013D Indianapolis, Indiana 46202-5122 Phone: 317-274-4371 Fax: 317-274-4686 email: qqho...@iupui.edu -- Dr Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Hi Ethan, mainly because (a) the calculation of likelihood is only based on a subset of the 'data' that are obtained from an X-ray diffraction experiment (for example, we ignore diffuse scattering as Ian pointed-out), I do not think that is a valid criticism. In any field of science one might hypothesize that conducting a different kind of experiment and fitting it in accordance with a different theory would produce a different model. But that is only a hypothetical; it does not invalidate the analysis of the experiment you did do based on the data you did collect. For the example I mentioned (diffuse scattering), the experiment would be identical. Although using only subset of the available information may not invalidate the analysis performed, still it is not the best that can be done with the data in hand. (b) we consciously avoid 'prior' because this would make the models 'subjective', meaning that better informed people would deposit (for the same data) different models than the less well informed, I don't know of anyone who consciously avoids using their prior knowledge to inform their current work. But yes, people with more experience may in the end deposit better models than people with little experience. That's why it is valuable to have automated tools like Molprobity to check a proposed model against established prior expectations. It's also one way this bulletin board is value, because it allows those with less experience to ask advice from those with more experience. Most people would like to think that the models they deposit correspond to an 'objective' representation of the experimentally accessible physical reality. The validation tools, mainly by enforcing a uniformity of interpretation, discourage (and not encourage) the incorporation in the model of prior knowledge about the problem at hand, and thus, offer to their users the safety of an 'objectively validated model'. (c) the format of the PDB does not offer much room for 'creative interpretations' of the electron density maps [for example, you can't have discrete disorder on the backbone (or has this changed ?)]. Could you expand on this point? I am not aware of any restriction on multiple backbone conformations, now or ever. It is true that our refinement programs have not always been very well suited to refine such a model, but that is not a fault of the PDB format. I stand corrected on that. It was probably just me :-) I sense that what is being deposited is not the 'best model' in any conceivable way, but the model that 'best' accounts for the final 2mFo-DFc map within the limitations of the program used for the final refinement. That would be true if the refinement is conducted in real space. However, it is nearly universal to do the final refinement in reciprocal space. The emphasis of what I said was clearly on model building, and not on the refinement methodology. The reference to the refinement program was again model-centric (ranging from the treatment of hydrogens, to the bulk solvent model used). Best regards, Nicholas -- Dr Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
snip it seems that we are trying to deposit one model to satisfy two different purposes - one for model validation and the other for model interpretation (use in docking etc), and what's good for one purpose might not be necessarily good for the other. /snip This has been discussed before on this list, but allow me to repeat it: You would have expected that the crystallographers' aim would be to deposit the model that maximises the product (likelihood * prior). Clearly, this is not what we do, mainly because (a) the calculation of likelihood is only based on a subset of the 'data' that are obtained from an X-ray diffraction experiment (for example, we ignore diffuse scattering as Ian pointed-out), (b) we consciously avoid 'prior' because this would make the models 'subjective', meaning that better informed people would deposit (for the same data) different models than the less well informed, (c) the format of the PDB does not offer much room for 'creative interpretations' of the electron density maps [for example, you can't have discrete disorder on the backbone (or has this changed ?)]. I sense that what is being deposited is not the 'best model' in any conceivable way, but the model that 'best' accounts for the final 2mFo-DFc map within the limitations of the program used for the final refinement. My twocents, Nicholas ps. May I say parenthetically that making the deposited models dependant on their intended usage, would possibly qualify as 'fraud' ;-) -- Dr Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Saturday 18 September 2010, Nicholas M Glykos wrote: snip it seems that we are trying to deposit one model to satisfy two different purposes - one for model validation and the other for model interpretation (use in docking etc), and what's good for one purpose might not be necessarily good for the other. /snip This has been discussed before on this list, but allow me to repeat it: You would have expected that the crystallographers' aim would be to deposit the model that maximises the product (likelihood * prior). Clearly, this is not what we do, I guess I have more faith that we do in fact aim for that. Our data, programs, models, and insight are imperfect, but we do our best with what we have. mainly because (a) the calculation of likelihood is only based on a subset of the 'data' that are obtained from an X-ray diffraction experiment (for example, we ignore diffuse scattering as Ian pointed-out), I do not think that is a valid criticism. In any field of science one might hypothesize that conducting a different kind of experiment and fitting it in accordance with a different theory would produce a different model. But that is only a hypothetical; it does not invalidate the analysis of the experiment you did do based on the data you did collect. (b) we consciously avoid 'prior' because this would make the models 'subjective', meaning that better informed people would deposit (for the same data) different models than the less well informed, I don't know of anyone who consciously avoids using their prior knowledge to inform their current work. But yes, people with more experience may in the end deposit better models than people with little experience. That's why it is valuable to have automated tools like Molprobity to check a proposed model against established prior expectations. It's also one way this bulletin board is value, because it allows those with less experience to ask advice from those with more experience. (c) the format of the PDB does not offer much room for 'creative interpretations' of the electron density maps [for example, you can't have discrete disorder on the backbone (or has this changed ?)]. Could you expand on this point? I am not aware of any restriction on multiple backbone conformations, now or ever. It is true that our refinement programs have not always been very well suited to refine such a model, but that is not a fault of the PDB format. I sense that what is being deposited is not the 'best model' in any conceivable way, but the model that 'best' accounts for the final 2mFo-DFc map within the limitations of the program used for the final refinement. That would be true if the refinement is conducted in real space. However, it is nearly universal to do the final refinement in reciprocal space. If a maximum likelihood residual is used, the aim is to achieve the best model in the generally accepted formal sense of being the the set of model parameter values that provide the most likely explanation for the observed data. The priors are imposed as restraints; the partial residual R_crystallographic(Fo, Fc) encompasses the agreement with the observed data. My twocents, Nicholas And mine in return :-) Ethan
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Hi Pavel, Am 16.09.10 17:56, schrieb Pavel Afonine: Hi Dirk, so, wouldn't be the deposition of the final model's Fcalc, Phic (and their weights) along with the final coordinates be the best solution? The final Fcalc are our best model and can be used to reproduce the final statistics (which would remove the sfcheck annoyance) and to reproduce the final electron density maps, and the coordinates can be used for what ever purpose they are needed, irrespective of adding riding hydrogens or not. it is a great idea and if you look in PDB deposited structure factors there is a number of them (but certainly not the majority) that are accompanied by Fcalc. However, a few things to keep in mind: - Imagine a (not very uncommon, unfortunately) situation when someone obtains the final model and Fcalc, and then, right before the PDB deposition does a final check in Coot, and moves/removes a few atoms (a few waters, or instance) here and there. Or may be does a real-space fit of a residue. Or removes H, if present. Or renames a ligand by request of PDB staff and accidentally change an atom parameter(s). All this in turn will invalidate the R-factors and make previously calculated Fcalc inconsistent with such a manipulated model. So, the bottom-line is: having a model that you can use to reproduce the reported statistics is important (for validation and database sanity at least, if someones believe that such a minor things wouldn't impair the biological interpretation - ultimate goal of protein structures). but this is exactly what one shouldn't do: manipulate the structure after the final refinement! And if you manipulate it for a good reason, do a last final refinement after that, before depositing coordinates and structure factors. Then, there will be no problems, as far as I can see. Best regards, Dirk -- *** Dirk Kostrewa Gene Center Munich, A5.07 Department of Biochemistry Ludwig-Maximilians-Universität München Feodor-Lynen-Str. 25 D-81377 Munich Germany Phone: +49-89-2180-76845 Fax:+49-89-2180-76999 E-mail: kostr...@genzentrum.lmu.de WWW:www.genzentrum.lmu.de ***
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Dirk, - Imagine a (not very uncommon, unfortunately) situation when someone obtains the final model and Fcalc, and then, right before the PDB deposition does a final check in Coot, and moves/removes a few atoms (a few waters, or instance) here and there. Or may be does a real-space fit of a residue. Or removes H, if present. Or renames a ligand by request of PDB staff and accidentally change an atom parameter(s). All this in turn will invalidate the R-factors and make previously calculated Fcalc inconsistent with such a manipulated model. So, the bottom-line is: having a model that you can use to reproduce the reported statistics is important (for validation and database sanity at least, if someones believe that such a minor things wouldn't impair the biological interpretation - ultimate goal of protein structures). but this is exactly what one shouldn't do: manipulate the structure after the final refinement! And if you manipulate it for a good reason, do a last final refinement after that, before depositing coordinates and structure factors. Then, there will be no problems, as far as I can see. I apology if what I wrote doesn't read clearly - this is exactly what I'm saying: in this particular reply and across the whole discussion. Note, I used the word unfortunately above. Anyway, saying it again: What I mentioned is based on my (and not only my - see relevant papers) observation running validation tools through the whole PDB and making note of such manipulated structure. It is a matter of fact that there are some intentionally or unintentionally manipulated models, it is very bad, it is unfortunate and obviously I'm strictly against it. I'm against it to a such a degree so even didn't bother to write a paper on this matter, which I mentioned on this thread already: J. Appl. Cryst. 2010, 43, 669-67. Therefore it is important to have a model that you can use to reproduce the reported statistics (for validation, at least), although having Fcalc around wouldn't hurt. Sorry again, if I wasn't clear in my previous reply. All the best! Pavel.
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
As a relatively inexperienced scientist, I find this discussion fascinating. I wonder if NMR and EM people are also worried about depositing enough modeled info to allow back calculation of data. Regarding the original discussion of whether to deposit riding hydrogens used in the refinement, it seems that we are trying to deposit one model to satisfy two different purposes - one for model validation and the other for model interpretation (use in docking etc), and what's good for one purpose might not be necessarily good for the other. I wonder if it would help to deposit two different models; one precisely reflects the model used in refinement and the other an energy minimized model with predicted hydrogens and alternative conformations removed? Cheers, Quyen __ Quyen Hoang, Ph.D Assistant Professor Department of Biochemistry and Molecular Biology, Stark Neurosciences Research Institute Indiana University School of Medicine 635 Barnhill Drive, Room MS0013D Indianapolis, Indiana 46202-5122 Phone: 317-274-4371 Fax: 317-274-4686 email: qqho...@iupui.edu On Sep 17, 2010, at 8:28 AM, Ian Tickle wrote: Oh, goodness, I see: even here, we would need clear rules what the calculated structure factors are, which weights are were, which bulk solvent correction was applied ... a maze, too! Fortunately the X-ray restraint weights/target values are not an issue here: varying them changes the refined model parameters of course, but they do not appear in the structure factor formula, so don't need to be specified in the mathematical model to obtain the Fcalcs. You would of course need to know all the weights target values (as well as the SF formula) to reproduce the refinement to get the deposited model. But could future programs really re-calculate the same structure factors from the deposited model? Because of the expected development of more advanced methods and algorithms, I have my doubts ... *sigh* Yes, if the deposited mathematical model is completely specified in terms of the SF formula used and the values of *all* the parameters that go into it, then in principle future versions of software using more advanced models will be able to reproduce the exact Fcalcs. This assumes that the advanced models will use the same 'core' formula but with additional terms and adjustable parameters, so that the simple model can be obtained from the advanced one by constraining the extra parameters to fixed values. However if the simple model is not 'nested' inside the more advanced model in this way, then no it will not be possible to reproduce the Fcalcs. However as I implied, the main issue is that we're rather lax at fully specifying our models (both formulae parameters): obviously if in future you don't have all the information you need to reproduce the calculation then you have no hope of getting the same Fcalcs! Cheers -- Ian
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Very interesting discussion. I wonder if the inexperienced user of PDB really exists? I don't know anyone off-hand who would really make use of information from hydrogen positions but not understand the issues. Although I hear they have been sighted in the Everglades http://en.wikipedia.org/wiki/Skunk_ape Kendall
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Dear Ian and contributors to this interesting thread, (please, scroll down a little bit) Am 15.09.10 23:34, schrieb Ian Tickle: I should just like to point out that the main source of the disagreement here seems to be that people have very different ideas about what a 'model' is or should be. Strictly a model is a purely mathematical construct, in this case it consists of the appropriate equation for the calculated structure factor and the best-fit values of the various parameters (scattering factors, atomic positions, occupancies, B factors, TLS parameters etc.) that appear in it. A mathematical model is inevitably going to be an imperfect representation of reality, but hopefully it's the best one we can come up with, in the sense of best explaining the data without significant overfitting. The problem arises because many users of the PDB, and I suspect many contributors to this BB, particularly non-crystallographers, don't see it like that, because they view a PDB file as a physical model, i.e. not as the best fit to the data (assuming that the non-crystallographers even know what the data are!), but the closest representation of reality. The difference between the N-H bond lengths that Ed referred to illustrates the distinction between the mathematical and the physical model. The mathematical model requires that the bond length is 0.86 Ang because that value gives the best fit of the assumed spherical scattering factor of H to the deformation density of the X-H covalent bond. The physical model requires that it be 1.00 Ang because that is the internuclear distance found by spectroscopic methods predicted by QM calculations. The same goes for B factors and TLS: to a large extent they are a mathematical construct whose purpose is to provide an optimal fit to the data. The connection of Bs TLS with reality is tenuous at best, nevertheless people obviously would like to have a physical interpretation such as rigid-body correlated motion. The fact that Bragg scattering provides no information about correlated motion (you need to measure the diffuse scattering for that) doesn't seem to deter them! I have no doubt in my mind that it is the mathematical model that should be published, because hopefully it's the best available interpretation of the data. Whether that involves publishing the riding H atoms explicitly, or alternatively the formulae and parameters that were used to calculate their positions I don't mind, as long as I can faithfully reproduce the Fcalcs to check the validity of the model. Then users of the PDB are free to *interpret* the mathematical models as physical models in a appropriate manner (e.g. by adjusting the bond lengths to H), and crystallographers have the untainted mathematical models needed to reproduce the Fcalcs. so, wouldn't be the deposition of the final model's Fcalc, Phic (and their weights) along with the final coordinates be the best solution? The final Fcalc are our best model and can be used to reproduce the final statistics (which would remove the sfcheck annoyance) and to reproduce the final electron density maps, and the coordinates can be used for what ever purpose they are needed, irrespective of adding riding hydrogens or not. Best regards, Dirk. -- *** Dirk Kostrewa Gene Center Munich, A5.07 Department of Biochemistry Ludwig-Maximilians-Universität München Feodor-Lynen-Str. 25 D-81377 Munich Germany Phone: +49-89-2180-76845 Fax:+49-89-2180-76999 E-mail: kostr...@genzentrum.lmu.de WWW:www.genzentrum.lmu.de ***
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Thursday 16 September 2010 01:25:12 am Dirk Kostrewa wrote: so, wouldn't be the deposition of the final model's Fcalc, Phic (and their weights) along with the final coordinates be the best solution? The final Fcalc are our best model and can be used to reproduce the final statistics (which would remove the sfcheck annoyance) and to reproduce the final electron density maps, and the coordinates can be used for what ever purpose they are needed, irrespective of adding riding hydrogens or not. Now I'm confused. Isn't that already the recommended, if not required, practice? Ethan -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Hi Dirk, so, wouldn't be the deposition of the final model's Fcalc, Phic (and their weights) along with the final coordinates be the best solution? The final Fcalc are our best model and can be used to reproduce the final statistics (which would remove the sfcheck annoyance) and to reproduce the final electron density maps, and the coordinates can be used for what ever purpose they are needed, irrespective of adding riding hydrogens or not. it is a great idea and if you look in PDB deposited structure factors there is a number of them (but certainly not the majority) that are accompanied by Fcalc. However, a few things to keep in mind: - Imagine a (not very uncommon, unfortunately) situation when someone obtains the final model and Fcalc, and then, right before the PDB deposition does a final check in Coot, and moves/removes a few atoms (a few waters, or instance) here and there. Or may be does a real-space fit of a residue. Or removes H, if present. Or renames a ligand by request of PDB staff and accidentally change an atom parameter(s). All this in turn will invalidate the R-factors and make previously calculated Fcalc inconsistent with such a manipulated model. So, the bottom-line is: having a model that you can use to reproduce the reported statistics is important (for validation and database sanity at least, if someones believe that such a minor things wouldn't impair the biological interpretation - ultimate goal of protein structures). - To reproduce typically the most used electron density maps, such as 2mFo-DFc and mFo-DFc, you would also need to deposit coefficients m and D, or, alternatively, have a program and free-R flags handy to compute m and D yourself. - Requiring Fcalc, you would have to make sure that this is actually the total structure factors Fmodel = scales*(Fcalc_atoms + F_bulk_solvent) with all other appropriate scales included. Although, this is easy to do by computing the R-factor and comparing it with the reported number. All the best! Pavel.
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Ethan wrote I believe that deposition of Fc Phic FOM should be required. Certainly it should be the recommended practice. For the same series of structures I just deposited, which started the the riding H discussion, my mtz file had Fc Phic FOM + other data put out by Phenix - pavel can elaborate. rcsb stripped almost all of this and the processed file has only: HKL, Flag, Fc, SigmaF and FOC :{ What's a structural biologist to do? -- Mark
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Hi Mark, I assume you deposited the mtz? This is what Ethan was referring to - the pdb does not do well with maintaining all the relevant columns when submitting the mtz file. However, if you convert your mtz to cif yourself and make sure it has all the columns you would like to include and then submit this cif file to the pdb, all the information is retained. Eric __ Eric Larson, PhD Biomolecular Structure Center Department of Biochemistry Box 357742 University of Washington Seattle, WA 98195 On Thu, 16 Sep 2010, Dr. Mark Mayer wrote: Ethan wrote I believe that deposition of Fc Phic FOM should be required. Certainly it should be the recommended practice. For the same series of structures I just deposited, which started the the riding H discussion, my mtz file had Fc Phic FOM + other data put out by Phenix - pavel can elaborate. rcsb stripped almost all of this and the processed file has only: HKL, Flag, Fc, SigmaF and FOC :{ What's a structural biologist to do? -- Mark
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Thursday 16 September 2010 09:56:14 am Dr. Mark Mayer wrote: Ethan wrote I believe that deposition of Fc Phic FOM should be required. Certainly it should be the recommended practice. For the same series of structures I just deposited, which started the the riding H discussion, my mtz file had Fc Phic FOM + other data put out by Phenix - pavel can elaborate. rcsb stripped almost all of this and the processed file has only: HKL, Flag, Fc, SigmaF and FOC :{ Huh? That's not a cif fragment. What file are you looking at? In my experience the PDB feeds back to you a cif format structure factor file with a name like rcsb054058-sf.cif Near the top of that file you should find a description of the data columns. The columns present depend on what you fed it, of course. loop_ _refln.crystal_id _refln.wavelength_id _refln.scale_group_code _refln.status _refln.index_h _refln.index_k _refln.index_l _refln.F_meas_au _refln.F_meas_sigma_au _refln.intensity_meas _refln.intensity_sigma _refln.F_calc _refln.fom _refln.phase_meas Caveat: I have never tried to deposit a structure factor file from phenix; maybe that triggers some other processing pathway. Does anyone here know? I would say that the simple, and almost guaranteed to work, procedure is to do the cif conversion yourself and deposit the cif file. I noted in another message that the auto-conversion script on the PDB deposition site has a tendency to lose columns. That's why it is better to do the conversion yourself. I can't say that they _never_ lose columns in an uploaded cif file. I have had that happen, but only once and quite a while ago. What's a structural biologist to do? The empiricist's approach. Experiment till you find a procedure that works, then stick to it :-) -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Thu, Sep 16, 2010 at 10:19:14AM -0700, Ethan Merritt wrote: [...] What's a structural biologist to do? The empiricist's approach. Experiment till you find a procedure that works, then stick to it :-) ... or the social approach: communicate with the person at the PDB responsible for your deposition. So far that's work great for me (plaudit for the people at the PDB(e)). Tim -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742 -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A signature.asc Description: Digital signature
[ccp4bb] Deposition of riding H: R-factor is overrated
Huh? That's not a cif fragment. What file are you looking at? In my experience the PDB feeds back to you a cif format structure factor file with a name like rcsb054058-sf.cif Near the top of that file you should find a description of the data columns. The columns present depend on what you fed it, of course. Come on guys - give me a break ... all I posted was just a list of the columns in the sf file - here's a cut and paste of what rcsb actually generated rcsb061284-sf.cif data_r3om0sf # _audit.revision_id 1_0 _audit.creation_date ? _audit.update_record'Initial release' loop_ _refln.wavelength_id _refln.crystal_id _refln.scale_group_code _refln.index_h _refln.index_k _refln.index_l _refln.status _refln.F_meas_au _refln.F_meas_sigma_au _refln.fom 1 1 1 008 o 203.06.3 0.99 1 1 1 00 10 o 281.58.7 0.86 Below is mtzdmp of what I actually deposited (as MTZ) Col SortMinMaxNum % Mean Mean Resolution Type Column num order Missing complete abs. LowHigh label 1 ASC 0 46 0 100.00 17.7 17.7 31.88 1.40 H H 2 NONE 0 72 0 100.00 27.4 27.4 31.88 1.40 H K 3 NONE 0 81 0 100.00 30.5 30.5 31.88 1.40 H L 4 NONE3.3 2160.3 0 100.00 162.89 162.89 31.88 1.40 F FOBS 5 NONE0.960.0 0 100.00 5.36 5.36 31.88 1.40 Q SIGFOBS 6 NONE0.0 1.0 0 100.00 0.05 0.05 31.88 1.40 I R_FREE_FLAGS 7 NONE0.1 2253.6 0 100.00 157.73 157.73 31.88 1.40 F FMODEL 8 NONE -180.0 180.0 0 100.00 2.6590.13 31.88 1.40 P PHIFMODEL 9 NONE0.0 5823.1 0 100.00 219.29 219.29 31.88 1.40 F FCALC 10 NONE -180.0 180.0 0 100.00 3.2490.09 31.88 1.40 P PHIFCALC 11 NONE0.0 15330.0 0 100.00 141.04 141.04 31.88 1.40 F FMASK 12 NONE -180.0 180.0 0 100.00 4.2990.74 31.88 1.40 P PHIFMASK 13 NONE0.0 6909.4 0 100.0015.4215.42 31.88 1.40 F FBULK 14 NONE -180.0 180.0 0 100.00 4.2990.74 31.88 1.40 P PHIFBULK 15 NONE 0.803 1.199 0 100.001.0041.004 31.88 1.40 W FB_CART 16 NONE 0.001 1.000 0 100.000.8770.877 31.88 1.40 W FOM 17 NONE 0.576 0.754 0 100.000.7050.705 31.88 1.40 W ALPHA 18 NONE277.388 0 100.00 5655.391 5655.391 31.88 1.40 W BETA -- Mark
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Thursday 16 September 2010 10:34:14 am Dr. Mark Mayer wrote: Huh? That's not a cif fragment. What file are you looking at? In my experience the PDB feeds back to you a cif format structure factor file with a name like rcsb054058-sf.cif Near the top of that file you should find a description of the data columns. The columns present depend on what you fed it, of course. Come on guys - give me a break ... all I posted was just a list of the columns in the sf file I sincerely apologize. Believe it or not, I mistook your emoticon for part of a file syntax that I was not familiar with. HKL, Flag, Fc, SigmaF and FOC :{ I thought that colon + curly bracket was some funky data delimiter. Ethan -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Wed, 2010-09-15 at 10:50 -0700, Pavel Afonine wrote: I wouldn't dare calling a model manipulation that typically changes the R-factor by 0.5 ... ~2% as nothing. Although, you are may be right - who cares? It's not a manipulation because no parameters were manipulated in the model. Don't you agree that using the riding model does not add additional refinable parameters? But your insistence has awakened my curiosity. So I looked at hydrogens as produced by phenix.refine for a 1.8A structure I randomly picked. Just as George has pointed out, the covalent bonds are too short. for instance, when hydrogens are added, the average N-H distance is 1.1(5), but upon refinement the value is down to 0.85998(4). I won't even begin discussing the fact that some of these hydrogens added to K,Y,S etc are placed in positions that are not justified by data (not in definitely wrong positions either, it's just that there is no evidence to support a particular torsion angle). And that it is unlikely that every histidine in the structure is fully protonated. Do you see the problem? I fully understand your desire to be able to reproduce the R-factors (although I don't necessarily share it), but if I decide to deposit this model with hydrogens, am I essentially stating that N-H bond is magically shortened to ~0.86A? Sure, it is driver's (PDB user's) responsibility to know the meaning of the red light (riding hydrogens), but wouldn't depositing riding hydrogens be equivalent to putting 70 mph sign at the ramp, just because all the cops know that it's not the actual safe speed? And then tell the accident victim that there was a fine print in the rule book? I think this situation is particularly problematic given that these days some enter the field the same way many people (at least so it seems here in Baltimore) get their driver's licenses, i.e. without ever learning the rules? Cheers, Ed. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Dear Ed, On 9/15/10 12:54 PM, Ed Pozharski wrote: On Wed, 2010-09-15 at 10:50 -0700, Pavel Afonine wrote: I wouldn't dare calling a model manipulation that typically changes the R-factor by 0.5 ... ~2% as nothing. Although, you are may be right - who cares? It's not a manipulation because no parameters were manipulated in the model. I can't agree with this, sorry. A change to a model content (especially the one that changes Fcalc) is a model manipulation. Pavel.
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On 9/15/10 3:54 PM, Ed Pozharski wrote: Don't you agree that using the riding model does not add additional refinable parameters? (snip) instance, when hydrogens are added, the average N-H distance is 1.1(5), but upon refinement the value is down to 0.85998(4). I So the riding hydrogen model is imperfect. At least with phenix.refine you can measure it, unlike the default behavior of REFMAC. (But you can tell it to write hydrogens out, I believe). Obviously this question is not one amenable to a simple answer. In some sense (as per George) riding hydrogens are merely a restraint. In some other sense they are fundamentally a part of the model - they have very directional properties via bumping restraints that most certainly alter the atomic model for the heavy atoms in a very direct way via collision. Since the nature of these atoms - locationally specific - differs from the more amorphous extended atom restraints (CH3E for methyl in CNS etc) it could make sense to include them in the model at deposition. As far as I know we do not delete atoms from the final model that contribute to scattering and geometric restraints under any other circumstances, except perhaps in the nearly-as-contentious how do I model my disordered side-chain case. Also not amenable to a simple answer. Both approaches (REFMAC-esque and PHENIX-esque) have their merits. I doubt I'm the only person here conflicted over what to do about it. However this thread appears to have reached the point where not much new ground is being broken. Phil Jeffrey Princeton
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Wed, 2010-09-15 at 13:13 -0700, Pavel Afonine wrote: I can't agree with this, sorry. A change to a model content (especially the one that changes Fcalc) is a model manipulation. That is not what I asked. Do you agree that using the riding model does not add additional refinable parameters? -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
I should just like to point out that the main source of the disagreement here seems to be that people have very different ideas about what a 'model' is or should be. Strictly a model is a purely mathematical construct, in this case it consists of the appropriate equation for the calculated structure factor and the best-fit values of the various parameters (scattering factors, atomic positions, occupancies, B factors, TLS parameters etc.) that appear in it. A mathematical model is inevitably going to be an imperfect representation of reality, but hopefully it's the best one we can come up with, in the sense of best explaining the data without significant overfitting. The problem arises because many users of the PDB, and I suspect many contributors to this BB, particularly non-crystallographers, don't see it like that, because they view a PDB file as a physical model, i.e. not as the best fit to the data (assuming that the non-crystallographers even know what the data are!), but the closest representation of reality. The difference between the N-H bond lengths that Ed referred to illustrates the distinction between the mathematical and the physical model. The mathematical model requires that the bond length is 0.86 Ang because that value gives the best fit of the assumed spherical scattering factor of H to the deformation density of the X-H covalent bond. The physical model requires that it be 1.00 Ang because that is the internuclear distance found by spectroscopic methods predicted by QM calculations. The same goes for B factors and TLS: to a large extent they are a mathematical construct whose purpose is to provide an optimal fit to the data. The connection of Bs TLS with reality is tenuous at best, nevertheless people obviously would like to have a physical interpretation such as rigid-body correlated motion. The fact that Bragg scattering provides no information about correlated motion (you need to measure the diffuse scattering for that) doesn't seem to deter them! I have no doubt in my mind that it is the mathematical model that should be published, because hopefully it's the best available interpretation of the data. Whether that involves publishing the riding H atoms explicitly, or alternatively the formulae and parameters that were used to calculate their positions I don't mind, as long as I can faithfully reproduce the Fcalcs to check the validity of the model. Then users of the PDB are free to *interpret* the mathematical models as physical models in a appropriate manner (e.g. by adjusting the bond lengths to H), and crystallographers have the untainted mathematical models needed to reproduce the Fcalcs. Cheers -- Ian On Wed, Sep 15, 2010 at 9:13 PM, Pavel Afonine pafon...@lbl.gov wrote: Dear Ed, On 9/15/10 12:54 PM, Ed Pozharski wrote: On Wed, 2010-09-15 at 10:50 -0700, Pavel Afonine wrote: I wouldn't dare calling a model manipulation that typically changes the R-factor by 0.5 ... ~2% as nothing. Although, you are may be right - who cares? It's not a manipulation because no parameters were manipulated in the model. I can't agree with this, sorry. A change to a model content (especially the one that changes Fcalc) is a model manipulation. Pavel.
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
On Wed, 2010-09-15 at 16:26 -0400, Phil Jeffrey wrote: So the riding hydrogen model is imperfect. At least with phenix.refine you can measure it, unlike the default behavior of REFMAC. (But you can tell it to write hydrogens out, I believe). My impression is that default behavior of phenix.refine is the same - I had to change parameters to include hydrogens in the output. Without breaking any new ground, there is really no conflict here. Is it a good idea to make a complete model description (including riding hydrogens, input files, cif-files, special case restraints etc) available for structures deposited in the PDB? Absolutely. But not in this form, when model is implying that we know the protonation states of all the atoms and has unreasonable geometry. For the example that I provided, the rmsd_bonds for that particular group is 0.14A, certainly unacceptable. Maybe one can use different record for these atoms, say RIDING instead of ATOM. Thus complete model can be recovered and at the same time the nature of these items is explicitly stated. In this way riding hydrogens are clearly distinguished from those that are actually refined at ultrahigh resolution. Cheers, Ed. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Dear Ed, On 9/15/10 2:47 PM, Ed Pozharski wrote: On Wed, 2010-09-15 at 16:26 -0400, Phil Jeffrey wrote: So the riding hydrogen model is imperfect. At least with phenix.refine you can measure it, unlike the default behavior of REFMAC. (But you can tell it to write hydrogens out, I believe). My impression is that default behavior of phenix.refine is the same - I had to change parameters to include hydrogens in the output. No, if your input file contains H atoms, the output file will contain them too (in phenix.refine). You don't have to change any parameters for this. Pavel.
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Sure. But if I start with model that has no hydrogens, they will be generated but not passed to the output, right. just like refmac. On Wed, 2010-09-15 at 14:52 -0700, Pavel Afonine wrote: Dear Ed, On 9/15/10 2:47 PM, Ed Pozharski wrote: On Wed, 2010-09-15 at 16:26 -0400, Phil Jeffrey wrote: So the riding hydrogen model is imperfect. At least with phenix.refine you can measure it, unlike the default behavior of REFMAC. (But you can tell it to write hydrogens out, I believe). My impression is that default behavior of phenix.refine is the same - I had to change parameters to include hydrogens in the output. No, if your input file contains H atoms, the output file will contain them too (in phenix.refine). You don't have to change any parameters for this. Pavel. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] Deposition of riding H: R-factor is overrated
Dear Ed, no, if you start with model that has no hydrogens, they will not be generated internally. Pavel. On 9/15/10 2:58 PM, Ed Pozharski wrote: Sure. But if I start with model that has no hydrogens, they will be generated but not passed to the output, right. just like refmac. On Wed, 2010-09-15 at 14:52 -0700, Pavel Afonine wrote: Dear Ed, On 9/15/10 2:47 PM, Ed Pozharski wrote: On Wed, 2010-09-15 at 16:26 -0400, Phil Jeffrey wrote: So the riding hydrogen model is imperfect. At least with phenix.refine you can measure it, unlike the default behavior of REFMAC. (But you can tell it to write hydrogens out, I believe). My impression is that default behavior of phenix.refine is the same - I had to change parameters to include hydrogens in the output. No, if your input file contains H atoms, the output file will contain them too (in phenix.refine). You don't have to change any parameters for this. Pavel.