Re: [ccp4bb] Concerns about statistics
On 06/14/2013 07:00 AM, John R Helliwell wrote: Alternatively, at poorer resolutions than that, you can monitor if the Cruickshank-Blow Diffraction Precision Index (DPI) improves or not as more data are steadily added to your model refinements. Dear John, unfortunately the behavior of DPIfree is less than satisfactory here - in a couple of cases I looked at it just steadily improves with resolution. Example I have in front of me right now takes resolution down from 2.0A to 1.55A, and DPIfree goes down from ~0.17A to 0.09A at almost constant pace (slows down from 0.021 A/0.1A to 0.017 A/0.1A around 1.75A). Notice that in this specific case I/sigI at 1.55A is ~0.4 and CC(1/2)~0.012 (even this non-repentant big-endian couldn't argue there is good signal there). DPIfree is essentially proportional to Rfree * d^(2.5) (this is assuming that No~1/d^3, Na and completeness do not change). To keep up with resolution changes, Rfree would have to go up ~1.9 times, and obviously that is not going to happen no matter how much weak data I throw in. The maximum-likelihood e.s.u. reported by Refmac makes more sense in this particular case as it clearly slows down big time around 1.77A (see https://plus.google.com/photos/113111298819619451614/albums/5889708830403779217). Coincidentally, Rfree also starts going up rapidly around the same resolution. If anyone is curious what's I/sigI is at the breaking point it's ~1.5 and CC(1/2)~0.6. And to bash Rmerge a little more, it's 112%. So there are two questions I am very much interested in here. a) Why is DPIfree so bad at this? Can we even believe it given it's erratic behavior in this scenario? b) I would normally set up a simple data mining project to see how common this ML_esu behavior is, but there is no easily accessible source of data processed to beyond I/sigI=2, let alone I/sigI=1 (are structural genomics folks reading this and do they maybe have such data to mine?). I can look into all of my own datasets, but that would be a biased selection of several crystal forms. Perhaps others have looked into this too, and what are your observations? Or maybe you have a dataset processed way beyond I/sigI=1 and are willing to either share it with me together with a final model or run refinement at a bunch of different resolutions and report the result (I can provide bash scripts as needed). Cheers, Ed. -- Oh, suddenly throwing a giraffe into a volcano to make water is crazy? Julian, King of Lemurs
Re: [ccp4bb] Concerns about statistics
Dear Ed, Thankyou for this. Indeed I have not pushed into the domain of I/sigI as low as 0.4 or CC1/2 as low as 0.012. So, I do not have an answer to your query at these extremes. But I concede I am duly corrected by your example and indeed my email did not tabulate specifically how far one could investigate the plateau of DPI and certainly I was not considering such an extreme as you have investigated. Best wishes, Yours sincerely, John Prof John R Helliwell DSc On 15 Jun 2013, at 15:31, Ed Pozharski epozh...@umaryland.edu wrote: On 06/14/2013 07:00 AM, John R Helliwell wrote: Alternatively, at poorer resolutions than that, you can monitor if the Cruickshank-Blow Diffraction Precision Index (DPI) improves or not as more data are steadily added to your model refinements. Dear John, unfortunately the behavior of DPIfree is less than satisfactory here - in a couple of cases I looked at it just steadily improves with resolution. Example I have in front of me right now takes resolution down from 2.0A to 1.55A, and DPIfree goes down from ~0.17A to 0.09A at almost constant pace (slows down from 0.021 A/0.1A to 0.017 A/0.1A around 1.75A). Notice that in this specific case I/sigI at 1.55A is ~0.4 and CC(1/2)~0.012 (even this non-repentant big-endian couldn't argue there is good signal there). DPIfree is essentially proportional to Rfree * d^(2.5) (this is assuming that No~1/d^3, Na and completeness do not change). To keep up with resolution changes, Rfree would have to go up ~1.9 times, and obviously that is not going to happen no matter how much weak data I throw in. The maximum-likelihood e.s.u. reported by Refmac makes more sense in this particular case as it clearly slows down big time around 1.77A (see https://plus.google.com/photos/113111298819619451614/albums/5889708830403779217). Coincidentally, Rfree also starts going up rapidly around the same resolution. If anyone is curious what's I/sigI is at the breaking point it's ~1.5 and CC(1/2)~0.6. And to bash Rmerge a little more, it's 112%. So there are two questions I am very much interested in here. a) Why is DPIfree so bad at this? Can we even believe it given it's erratic behavior in this scenario? b) I would normally set up a simple data mining project to see how common this ML_esu behavior is, but there is no easily accessible source of data processed to beyond I/sigI=2, let alone I/sigI=1 (are structural genomics folks reading this and do they maybe have such data to mine?). I can look into all of my own datasets, but that would be a biased selection of several crystal forms. Perhaps others have looked into this too, and what are your observations? Or maybe you have a dataset processed way beyond I/sigI=1 and are willing to either share it with me together with a final model or run refinement at a bunch of different resolutions and report the result (I can provide bash scripts as needed). Cheers, Ed. -- Oh, suddenly throwing a giraffe into a volcano to make water is crazy? Julian, King of Lemurs
Re: [ccp4bb] Concerns about statistics
Dear Andrea, I agree with Tim and still cut the resolution at I/sigma=2. In my experience, including higher resolution shells with poorer signal-to-noise never changed the apparent resolution of the electron density maps. In addition, the high resolution limit at I/sigma=2 coincides very well with the point where the Fo vs. Fo +Gauss(0,1)*sigma(Fo) correlation coefficient curve, reported by BUSTER, crosses the recommended lower limit of 0.9. And please note, CC*=0.5 corresponds to CC(1/2)=0.143. In my very limited experience, I/sigma=2 corresponds to roughly CC(1/2)~0.7. Although I'm very excited about the CC(1/2) or CC* paper by Karplus Diederichs, I still prefer to be on the save side, until it has been verified in numerous cases, that choosing high resolution cutoffs based on CC(1/2) really leads to higher resolution structures. The recommended procedure to include small resolution increments in refinement to decide the high resolution cutoff is very time-consuming. Best regards, Dirk. Am 13.06.13 17:15, schrieb Andrea Edwards: Hello group, I have some rather (embarrassingly) basic questions to ask. Mainly.. when deciding the resolution limit, which statistics are the most important? I have always been taught that the highest resolution bin should be chosen with I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness should be as high as possible. However, I am currently encountered with a set of statistics that are clearly outside this criteria. Is it acceptable cut off resolution using I/sig as low as 1.5 as long as the completeness is greater than 75%? Another way to put this.. if % completeness is the new criteria for choosing your resolution limit (instead of Rmerg or I/sig), then what %completeness is too low to be considered? Also, I am aware that Rmerg increases with redundancy, is it acceptable to report Rmerg (or Rsym) at 66% and 98% with redundancy at 3.8 and 2.4 for the highest resolution bin of these crystals? I appreciate any comments. -A -- *** Dirk Kostrewa Gene Center Munich Department of Biochemistry Ludwig-Maximilians-Universität München Feodor-Lynen-Str. 25 D-81377 Munich Germany Phone: +49-89-2180-76845 Fax:+49-89-2180-76999 E-mail: kostr...@genzentrum.lmu.de WWW:www.genzentrum.lmu.de ***
Re: [ccp4bb] Concerns about statistics
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/14/2013 11:43 AM, Dirk Kostrewa wrote: [...] The recommended procedure to include small resolution increments in refinement to decide the high resolution cutoff is very time-consuming. ... and very subjective: noise can look very unnoisy if you know what you are looking for! Best, Tim Am 13.06.13 17:15, schrieb Andrea Edwards: Hello group, I have some rather (embarrassingly) basic questions to ask. Mainly.. when deciding the resolution limit, which statistics are the most important? I have always been taught that the highest resolution bin should be chosen with I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness should be as high as possible. However, I am currently encountered with a set of statistics that are clearly outside this criteria. Is it acceptable cut off resolution using I/sig as low as 1.5 as long as the completeness is greater than 75%? Another way to put this.. if % completeness is the new criteria for choosing your resolution limit (instead of Rmerg or I/sig), then what %completeness is too low to be considered? Also, I am aware that Rmerg increases with redundancy, is it acceptable to report Rmerg (or Rsym) at 66% and 98% with redundancy at 3.8 and 2.4 for the highest resolution bin of these crystals? I appreciate any comments. -A - -- - -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFRuuj8UxlJ7aRr7hoRAqsUAKDzjvv7cwsdqr3r3sPWy5efUQrpTwCgyC+k K9UZDrAIwwRN01kAF+dKCGw= =HbRH -END PGP SIGNATURE-
Re: [ccp4bb] Concerns about statistics
BTW there's a also an earlier paper (properly cited in Karplus Diederichs 2012) showing the benefit of weak 'high-resolution' reflections. Acta Crystallogr D Biol Crystallogr. 2010 Sep;66(Pt 9):988-1000. doi: 10.1107/S0907444910029938. Epub 2010 Aug 13. Inclusion of weak high-resolution X-ray data for improvement of a group II intron structure. Wang J. Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA. jimin.w...@yale.edu Abstract It is common to report the resolution of a macromolecular structure with the highest resolution shell having an averaged I/sigma(I) or = 2. Data beyond the resolution thus defined are weak and often poorly measured. The exclusion of these weak data may improve the apparent statistics and also leads to claims of lower resolutions that give some leniency in the acceptable quality of refined models. However, the inclusion of these data can provide additional strong constraints on atomic models during structure refinement and thus help to correct errors in the original models, as has recently been demonstrated for a protein structure. Here, an improved group II intron structure is reported arising from the inclusion of these data, which helped to define more accurate solvent models for density modification during experimental phasing steps. With the improved resolution and accuracy of the experimental phases, extensive revisions were made to the original models such that the correct tertiary interactions of the group II intron that are essential for understanding the chemistry of this ribozyme could be described. Best wishes Roberto On 14 Jun 2013, at 10:43, Dirk Kostrewa kostr...@genzentrum.lmu.de wrote: Dear Andrea, I agree with Tim and still cut the resolution at I/sigma=2. In my experience, including higher resolution shells with poorer signal-to-noise never changed the apparent resolution of the electron density maps. In addition, the high resolution limit at I/sigma=2 coincides very well with the point where the Fo vs. Fo +Gauss(0,1)*sigma(Fo) correlation coefficient curve, reported by BUSTER, crosses the recommended lower limit of 0.9. And please note, CC*=0.5 corresponds to CC(1/2)=0.143. In my very limited experience, I/sigma=2 corresponds to roughly CC(1/2)~0.7. Although I'm very excited about the CC(1/2) or CC* paper by Karplus Diederichs, I still prefer to be on the save side, until it has been verified in numerous cases, that choosing high resolution cutoffs based on CC(1/2) really leads to higher resolution structures. The recommended procedure to include small resolution increments in refinement to decide the high resolution cutoff is very time-consuming. Best regards, Dirk. Am 13.06.13 17:15, schrieb Andrea Edwards: Hello group, I have some rather (embarrassingly) basic questions to ask. Mainly.. when deciding the resolution limit, which statistics are the most important? I have always been taught that the highest resolution bin should be chosen with I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness should be as high as possible. However, I am currently encountered with a set of statistics that are clearly outside this criteria. Is it acceptable cut off resolution using I/sig as low as 1.5 as long as the completeness is greater than 75%? Another way to put this.. if % completeness is the new criteria for choosing your resolution limit (instead of Rmerg or I/sig), then what %completeness is too low to be considered? Also, I am aware that Rmerg increases with redundancy, is it acceptable to report Rmerg (or Rsym) at 66% and 98% with redundancy at 3.8 and 2.4 for the highest resolution bin of these crystals? I appreciate any comments. -A -- *** Dirk Kostrewa Gene Center Munich Department of Biochemistry Ludwig-Maximilians-Universität München Feodor-Lynen-Str. 25 D-81377 Munich Germany Phone:+49-89-2180-76845 Fax: +49-89-2180-76999 E-mail: kostr...@genzentrum.lmu.de WWW: www.genzentrum.lmu.de *** Roberto A. Steiner Group Leader Randall Division of Cell and Molecular Biophysics King's College London roberto.stei...@kcl.ac.uk Room 3.10A New Hunt's House Guy's Campus SE1 1UL London Phone 0044 20 78488216 Fax0044 20 78486435
Re: [ccp4bb] Concerns about statistics
Dear Andrea, Checking the quality of electron density maps has been correctly mentioned as one adds more data. In chemical crystallography one can monitor the bond distance and angles sigmas ie until adding more data at ever higher resolution causes them to deteriorate in quality. The equivalent in protein crystallography can be done if one has sufficient resolution for the least squares model refinement full matrix inversion to work. Alternatively, at poorer resolutions than that, you can monitor if the Cruickshank-Blow Diffraction Precision Index (DPI) improves or not as more data are steadily added to your model refinements. To quote the resolution in your article at which I/sigI crosses 2, whilst not cutting the data at that point, can assist the reader in knowing the data quality you have worked with. Making the raw diffraction data images available would make things totally clear to the reader. Best wishes, John Professor John R Helliwell DSc On Thu, Jun 13, 2013 at 4:15 PM, Andrea Edwards edwar...@stanford.eduwrote: Hello group, I have some rather (embarrassingly) basic questions to ask. Mainly.. when deciding the resolution limit, which statistics are the most important? I have always been taught that the highest resolution bin should be chosen with I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness should be as high as possible. However, I am currently encountered with a set of statistics that are clearly outside this criteria. Is it acceptable cut off resolution using I/sig as low as 1.5 as long as the completeness is greater than 75%? Another way to put this.. if % completeness is the new criteria for choosing your resolution limit (instead of Rmerg or I/sig), then what %completeness is too low to be considered? Also, I am aware that Rmerg increases with redundancy, is it acceptable to report Rmerg (or Rsym) at 66% and 98% with redundancy at 3.8 and 2.4 for the highest resolution bin of these crystals? I appreciate any comments. -A --
Re: [ccp4bb] Concerns about statistics
In their paper K D monitored the electron density for their coffactor and could verify that adding higher resolution shells based on the CC1/2 statistics improved the way it looked. I'm not sure they monitored bond-distances and/or esd's but those may well have been affected by restraints and weights anyway, in the reoslution they worked (~1.45 A?). It might be more difficult to judge the effect of including higher resolution shells if there isn't a feature that is easy to monitor as you increase the resolution. In one of the cases that I'm working on I certainly noticed better geometry and e.d. for the co-factor upon adding (somewhat) higher resolution shells. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Steiner, Roberto [roberto.stei...@kcl.ac.uk] Sent: Friday, June 14, 2013 12:58 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Concerns about statistics BTW there's a also an earlier paper (properly cited in Karplus Diederichs 2012) showing the benefit of weak 'high-resolution' reflections. Acta Crystallogr D Biol Crystallogr. 2010 Sep;66(Pt 9):988-1000. doi: 10.1107/S0907444910029938. Epub 2010 Aug 13. Inclusion of weak high-resolution X-ray data for improvement of a group II intron structure. Wang J. Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA. jimin.w...@yale.edu Abstract It is common to report the resolution of a macromolecular structure with the highest resolution shell having an averaged I/sigma(I) or = 2. Data beyond the resolution thus defined are weak and often poorly measured. The exclusion of these weak data may improve the apparent statistics and also leads to claims of lower resolutions that give some leniency in the acceptable quality of refined models. However, the inclusion of these data can provide additional strong constraints on atomic models during structure refinement and thus help to correct errors in the original models, as has recently been demonstrated for a protein structure. Here, an improved group II intron structure is reported arising from the inclusion of these data, which helped to define more accurate solvent models for density modification during experimental phasing steps. With the improved resolution and accuracy of the experimental phases, extensive revisions were made to the original models such that the correct tertiary interactions of the group II intron that are essential for understanding the chemistry of this ribozyme could be described. Best wishes Roberto On 14 Jun 2013, at 10:43, Dirk Kostrewa kostr...@genzentrum.lmu.de wrote: Dear Andrea, I agree with Tim and still cut the resolution at I/sigma=2. In my experience, including higher resolution shells with poorer signal-to-noise never changed the apparent resolution of the electron density maps. In addition, the high resolution limit at I/sigma=2 coincides very well with the point where the Fo vs. Fo +Gauss(0,1)*sigma(Fo) correlation coefficient curve, reported by BUSTER, crosses the recommended lower limit of 0.9. And please note, CC*=0.5 corresponds to CC(1/2)=0.143. In my very limited experience, I/sigma=2 corresponds to roughly CC(1/2)~0.7. Although I'm very excited about the CC(1/2) or CC* paper by Karplus Diederichs, I still prefer to be on the save side, until it has been verified in numerous cases, that choosing high resolution cutoffs based on CC(1/2) really leads to higher resolution structures. The recommended procedure to include small resolution increments in refinement to decide the high resolution cutoff is very time-consuming. Best regards, Dirk. Am 13.06.13 17:15, schrieb Andrea Edwards: Hello group, I have some rather (embarrassingly) basic questions to ask. Mainly.. when deciding the resolution limit, which statistics are the most important? I have always been taught that the highest resolution bin should be chosen with I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness should be as high as possible. However, I am currently encountered with a set of statistics that are clearly outside this criteria. Is it acceptable cut off resolution using I/sig as low as 1.5 as long as the completeness is greater than 75%? Another way to put this.. if % completeness is the new criteria for choosing your resolution limit (instead of Rmerg or I/sig), then what %completeness is too low to be considered? Also, I am aware that Rmerg increases with redundancy, is it acceptable to report Rmerg (or Rsym) at 66% and 98% with redundancy at 3.8 and 2.4 for the highest resolution bin of these crystals? I appreciate any comments
Re: [ccp4bb] Concerns about statistics
Hi Boaz, The improvement you see in the cofactor geometry after inclusion of higher resolution data is very interesting, but is it possible that this is a secondary effect resulting from the additional Xray data changing the relative weighting of the Xray and stereochemical terms in the refinement ? Could you get a similar geometry improvement simply by changing the relative weighting (using only the original data) or would this only be with a penalty in other statistics ? Did you quantify the improvement in the cofactor density, for example by a correlation coefficient ? Cheers, Andrew On 14 Jun 2013, at 13:45, Boaz Shaanan bshaa...@exchange.bgu.ac.il wrote: In their paper K D monitored the electron density for their coffactor and could verify that adding higher resolution shells based on the CC1/2 statistics improved the way it looked. I'm not sure they monitored bond-distances and/or esd's but those may well have been affected by restraints and weights anyway, in the reoslution they worked (~1.45 A?). It might be more difficult to judge the effect of including higher resolution shells if there isn't a feature that is easy to monitor as you increase the resolution. In one of the cases that I'm working on I certainly noticed better geometry and e.d. for the co-factor upon adding (somewhat) higher resolution shells. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Steiner, Roberto [roberto.stei...@kcl.ac.uk] Sent: Friday, June 14, 2013 12:58 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Concerns about statistics BTW there's a also an earlier paper (properly cited in Karplus Diederichs 2012) showing the benefit of weak 'high-resolution' reflections. Acta Crystallogr D Biol Crystallogr. 2010 Sep;66(Pt 9):988-1000. doi: 10.1107/S0907444910029938. Epub 2010 Aug 13. Inclusion of weak high-resolution X-ray data for improvement of a group II intron structure. Wang J. Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA. jimin.w...@yale.edu Abstract It is common to report the resolution of a macromolecular structure with the highest resolution shell having an averaged I/sigma(I) or = 2. Data beyond the resolution thus defined are weak and often poorly measured. The exclusion of these weak data may improve the apparent statistics and also leads to claims of lower resolutions that give some leniency in the acceptable quality of refined models. However, the inclusion of these data can provide additional strong constraints on atomic models during structure refinement and thus help to correct errors in the original models, as has recently been demonstrated for a protein structure. Here, an improved group II intron structure is reported arising from the inclusion of these data, which helped to define more accurate solvent models for density modification during experimental phasing steps. With the improved resolution and accuracy of the experimental phases, extensive revisions were made to the original models such that the correct tertiary interactions of the group II intron that are essential for understanding the chemistry of this ribozyme could be described. Best wishes Roberto On 14 Jun 2013, at 10:43, Dirk Kostrewa kostr...@genzentrum.lmu.de wrote: Dear Andrea, I agree with Tim and still cut the resolution at I/sigma=2. In my experience, including higher resolution shells with poorer signal-to-noise never changed the apparent resolution of the electron density maps. In addition, the high resolution limit at I/sigma=2 coincides very well with the point where the Fo vs. Fo +Gauss(0,1)*sigma(Fo) correlation coefficient curve, reported by BUSTER, crosses the recommended lower limit of 0.9. And please note, CC*=0.5 corresponds to CC(1/2)=0.143. In my very limited experience, I/sigma=2 corresponds to roughly CC(1/2)~0.7. Although I'm very excited about the CC(1/2) or CC* paper by Karplus Diederichs, I still prefer to be on the save side, until it has been verified in numerous cases, that choosing high resolution cutoffs based on CC(1/2) really leads to higher resolution structures. The recommended procedure to include small resolution increments in refinement to decide the high resolution cutoff is very time-consuming. Best regards, Dirk. Am 13.06.13 17:15, schrieb Andrea Edwards: Hello group, I have some rather (embarrassingly) basic questions to ask. Mainly.. when deciding the resolution limit, which statistics are the most important? I have always been taught that the highest resolution bin should
Re: [ccp4bb] Concerns about statistics
On Thu, Jun 13, 2013 at 8:15 AM, Andrea Edwards edwar...@stanford.eduwrote: I have some rather (embarrassingly) basic questions to ask. Mainly.. when deciding the resolution limit, which statistics are the most important? I have always been taught that the highest resolution bin should be chosen with I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness should be as high as possible. However, I am currently encountered with a set of statistics that are clearly outside this criteria. Is it acceptable cut off resolution using I/sig as low as 1.5 as long as the completeness is greater than 75%? Another way to put this.. if % completeness is the new criteria for choosing your resolution limit (instead of Rmerg or I/sig), then what %completeness is too low to be considered? Also, I am aware that Rmerg increases with redundancy, is it acceptable to report Rmerg (or Rsym) at 66% and 98% with redundancy at 3.8 and 2.4 for the highest resolution bin of these crystals? I appreciate any comments. A (probably) better way: http://www.ncbi.nlm.nih.gov/pubmed/22628654 Short version: don't try to use simplistic rules, instead use all data that actually improve the model. In practice, what I've noticed in some recent articles is (paraphrasing) data extend to 2.5Å with an I/sigma of 2 in the highest-resolution shell, but we used data to 2.2Å as suggested by Karplus Diederichs. This allows you to actually use as much data as possible while still (hopefully) pleasing any pedantic reviewers. (Substitute 90% completeness or R-merge of whatever for the I/sigma cutoff if you prefer, the end result will still be the same.) -Nat
Re: [ccp4bb] Concerns about statistics
In this case, the author should report a correlation coefficient along with the other standard statistics (I/sigI, Rmerg, Completeness, redundancy, ect.)? What about Rpim instead of Rmerg? and if Rpim is reported, what should be the criteria for resolution cutoff? Also, if this paper is the new standard how should we regard statistic reported in the literature? Or.. more importantly, how do we go about reviewing current literature that does not report this statistic? - Original Message - From: Klaus Fütterer k.futte...@bham.ac.uk To: Andrea Edwards edwar...@stanford.edu Sent: Thursday, June 13, 2013 8:27:33 AM Subject: Re: [ccp4bb] Concerns about statistics The commonly accepted answer is in Linking crystallographic model and data quality. Karplus PA, Diederichs K. Science . 2012 May 25;336(6084):1030-3. doi: 10.1126/science.1218231. Best wishes, Klaus Fütterer === Dr. Klaus Fütterer Deputy Head of School Undergraduate Admissions Room 717, Biosciences Tower School of Biosciences P: +44-(0)-121-414 5895 University of Birmingham F: +44-(0)-121-414 5925 Edgbaston E: k.futte...@bham.ac.uk Birmingham, B15 2TT, UK W: http://tinyurl.com/futterer-lab === On 13 Jun 2013, at 16:15, Andrea Edwards wrote: Hello group, I have some rather (embarrassingly) basic questions to ask. Mainly.. when deciding the resolution limit, which statistics are the most important? I have always been taught that the highest resolution bin should be chosen with I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness should be as high as possible. However, I am currently encountered with a set of statistics that are clearly outside this criteria. Is it acceptable cut off resolution using I/sig as low as 1.5 as long as the completeness is greater than 75%? Another way to put this.. if % completeness is the new criteria for choosing your resolution limit (instead of Rmerg or I/sig), then what %completeness is too low to be considered? Also, I am aware that Rmerg increases with redundancy, is it acceptable to report Rmerg (or Rsym) at 66% and 98% with redundancy at 3.8 and 2.4 for the highest resolution bin of these crystals? I appreciate any comments. -A
Re: [ccp4bb] Concerns about statistics
..and Rmerg seems to be meaningless for judging data quality? - Original Message - From: Klaus Fütterer k.futte...@bham.ac.uk To: Andrea Edwards edwar...@stanford.edu Sent: Thursday, June 13, 2013 8:49:13 AM Subject: Re: [ccp4bb] Concerns about statistics Seems you are reviewing a paper at present. If this is indeed the case, it is fair to ask the authors to supply CC1/2 for their data or to rationalise their hi-res cut-off in light of that stats. For older papers, you can't obviously do that. As always in stats, there is no sharp line. My personal take is: I/sig 1.5 in the high res shell with at least 85% completeness (at that cut-off). 15 years ago, I would have said I/sigI 3 with at least 75% completeness, from which you can see how arbitrary the figures are. The merit of the Karplus Diederichs paper is to demonstrate changes in the electron density map in relation to cut-offs. Klaus === Dr. Klaus Fütterer Deputy Head of School Undergraduate Admissions Room 717, Biosciences Tower School of Biosciences P: +44-(0)-121-414 5895 University of Birmingham F: +44-(0)-121-414 5925 Edgbaston E: k.futte...@bham.ac.uk Birmingham, B15 2TT, UK W: http://tinyurl.com/futterer-lab === On 13 Jun 2013, at 16:44, Andrea Edwards wrote: In this case, the author should report a correlation coefficient along with the other standard statistics (I/sigI, Rmerg, Completeness, redundancy, ect.)? What about Rpim instead of Rmerg? and if Rpim is reported, what should be the criteria for resolution cutoff? Also, if this paper is the new standard how should we regard statistic reported in the literature? Or.. more importantly, how do we go about reviewing current literature that does not report this statistic? - Original Message - From: Klaus Fütterer k.futte...@bham.ac.uk To: Andrea Edwards edwar...@stanford.edu Sent: Thursday, June 13, 2013 8:27:33 AM Subject: Re: [ccp4bb] Concerns about statistics The commonly accepted answer is in Linking crystallographic model and data quality. Karplus PA, Diederichs K. Science . 2012 May 25;336(6084):1030-3. doi: 10.1126/science.1218231. Best wishes, Klaus Fütterer === Dr. Klaus Fütterer Deputy Head of School Undergraduate Admissions Room 717, Biosciences Tower School of Biosciences P: +44-(0)-121-414 5895 University of Birmingham F: +44-(0)-121-414 5925 Edgbaston E: k.futte...@bham.ac.uk Birmingham, B15 2TT, UK W: http://tinyurl.com/futterer-lab === On 13 Jun 2013, at 16:15, Andrea Edwards wrote: Hello group, I have some rather (embarrassingly) basic questions to ask. Mainly.. when deciding the resolution limit, which statistics are the most important? I have always been taught that the highest resolution bin should be chosen with I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness should be as high as possible. However, I am currently encountered with a set of statistics that are clearly outside this criteria. Is it acceptable cut off resolution using I/sig as low as 1.5 as long as the completeness is greater than 75%? Another way to put this.. if % completeness is the new criteria for choosing your resolution limit (instead of Rmerg or I/sig), then what %completeness is too low to be considered? Also, I am aware that Rmerg increases with redundancy, is it acceptable to report Rmerg (or Rsym) at 66% and 98% with redundancy at 3.8 and 2.4 for the highest resolution bin of these crystals? I appreciate any comments. -A
Re: [ccp4bb] Concerns about statistics
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Andrea, unless you are desperately longing for resolution, I normally cut the resolution where I/sigI 2.0. You should, however, make up your own rules as to how to determine I/sigI: it must be computed in resolution shells, and if you choose the shells wide enough the strong data might cover up for the weak ones. I usually run XSCALE or xprep and use their preset resolution shells (e.g. the shells in CORRECT.LP are wider than those in XSCALE). Rmerge is obsolete, and I always encourage people to use Rmeas (aka Rrim, not Rpim!) instead. Publishing Rmerge is a little bit like saying, we have always gone for lunch at 1pm, so we will stick to that - it's a habit despite better knowledge. I do not consider Rmerge/Rmeas in order to descide about the resolution cut-off and I have used data with Rmeas 200% provided I/sigI 2.0 The completeness does not really say much in terms of data quality: with a little tweaking of the parameters, most integration programs would give you about 99.9% completeness even if their is mostly noise on the detector! The method suggested by Karplus and Diederichs works well, i.e. checking various resolution ranges, e.g. in steps of 0.2A, and looking at the maps - but it is time consuming and I would only apply it at low resolution where +/- 0.1A can make quite a difference (actually in both directions: if you include too much noise, the maps become more difficult to interpret). Best, Tim P.S.: Your question is not embarrassing, it is an ongoing discussion with no definite answer. On 06/13/2013 05:15 PM, Andrea Edwards wrote: Hello group, I have some rather (embarrassingly) basic questions to ask. Mainly.. when deciding the resolution limit, which statistics are the most important? I have always been taught that the highest resolution bin should be chosen with I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness should be as high as possible. However, I am currently encountered with a set of statistics that are clearly outside this criteria. Is it acceptable cut off resolution using I/sig as low as 1.5 as long as the completeness is greater than 75%? Another way to put this.. if % completeness is the new criteria for choosing your resolution limit (instead of Rmerg or I/sig), then what %completeness is too low to be considered? Also, I am aware that Rmerg increases with redundancy, is it acceptable to report Rmerg (or Rsym) at 66% and 98% with redundancy at 3.8 and 2.4 for the highest resolution bin of these crystals? I appreciate any comments. -A - -- - -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFRue6rUxlJ7aRr7hoRAleBAKDnYEA+eMoGoyS6vWD6HfU+XG+fhwCgqMG2 F2AnLH6B1rPTbqeJ9wM9FjE= =EoSr -END PGP SIGNATURE-
Re: [ccp4bb] Concerns about statistics
On Thu, 2013-06-13 at 08:44 -0700, Andrea Edwards wrote: In this case, the author should report a correlation coefficient along with the other standard statistics (I/sigI, Rmerg, Completeness, redundancy, ect.)? Won't hurt. What about Rpim instead of Rmerg? and if Rpim is reported, what should be the criteria for resolution cutoff? Rmerge is known to be deeply flawed for ~15 years. IMHO, it shall not be reported at all. While Rpim is better, the whole point of KarplusDiederichs is that R-type measures are not very useful in deciding resolution cutoff. Also, if this paper is the new standard how should we regard statistic reported in the literature? We should keep in mind that conservative resolution cutoff criteria has been used in the field for decades. Or.. more importantly, how do we go about reviewing current literature that does not report this statistic? Structures refined up to I/sigma=2 should be considered likely to have been refined to resolution that was cut off too low. With that said, I am pretty sure that in vast majority of cases structural conclusions derived with I/s=2 vs CC1/2=0.5 vs DR=0 cutoff will be essentially the same. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] Concerns about statistics
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/13/2013 06:16 PM, Ed Pozharski wrote: [...] With that said, I am pretty sure that in vast majority of cases structural conclusions derived with I/s=2 vs CC1/2=0.5 vs DR=0 cutoff will be essentially the same. Hi Ed, in my experience, CC(1/2) 0.7 corresponds quite well to I/sigI 2.0 rather than CC(1/2) 0.5 (again, with the default resolution shells from xprep that also plots CC(1/2) vs. resolution. Are above numbers based on experience, too? If so, which program do you usually use to look at these statistics? Tim - -- - -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFRufFfUxlJ7aRr7hoRAk5UAKCHC1IpbdXmQE/5e1eRD+SON+jarQCg0b3m JCduhOJnVhczCUy+qv9n3Oc= =BY32 -END PGP SIGNATURE-
Re: [ccp4bb] Concerns about statistics
Tim, my personal preference always was I/sigI=1. In my Scalepack days, I always noticed that ~30% of the reflections in the I/sigI=1 shells had I/sigI2, and formed an unverified belief that there should be some information there. In my experience, CC1/2=0.5 would normally yield I/sigI~1, not 2. This is based predominantly on Scala/Aimless. Cheers, Ed. On Thu, 2013-06-13 at 18:20 +0200, Tim Gruene wrote: On 06/13/2013 06:16 PM, Ed Pozharski wrote: [...] With that said, I am pretty sure that in vast majority of cases structural conclusions derived with I/s=2 vs CC1/2=0.5 vs DR=0 cutoff will be essentially the same. Hi Ed, in my experience, CC(1/2) 0.7 corresponds quite well to I/sigI 2.0 rather than CC(1/2) 0.5 (again, with the default resolution shells from xprep that also plots CC(1/2) vs. resolution. Are above numbers based on experience, too? If so, which program do you usually use to look at these statistics? Tim -- I don't know why the sacrifice thing didn't work. Science behind it seemed so solid. Julian, King of Lemurs
Re: [ccp4bb] Concerns about statistics
Hi Andrea, Any choice you make about a resolution cut-off based on a rule of thumb can be called into question by a referee who uses a different rule of thumb. So if you choose a metric + cut-off that is anything less than very conservative (say I/sigI 1), you have to be able to defend your choice either with a reference or with evidence from experiments. This is where the 'paired refinement' of the Karplus and Diederichs paper kicks in: you can show that you can get useful information out of the extra high resolution reflections by comparing refinement results. So what you can do is first solve your structure, build and refine using a conservative resolution cut-off. Once you are nearing the final stages of the process you can gradually go for higher resolutions using the paired refinement procedure. That way you have some results to support you choice of resolution cut-off. Who knows, when you reach the best resolution cut-off you may be able to add some more details to your structure model, that you would have missed otherwise. If you think that doing the paired refinement is too much work, you can try PDB_REDO. If you give it a PDB file with a resolution cut-off in REMARK 2 or 3 lower than the maximal resolution of your reflection file, it will automatically use paired refinement to find the best resolution cut-off (yes, this is a self-plug!). HTH, Robbie Joosten -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Andrea Edwards Sent: Thursday, June 13, 2013 17:15 To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] Concerns about statistics Hello group, I have some rather (embarrassingly) basic questions to ask. Mainly.. when deciding the resolution limit, which statistics are the most important? I have always been taught that the highest resolution bin should be chosen with I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness should be as high as possible. However, I am currently encountered with a set of statistics that are clearly outside this criteria. Is it acceptable cut off resolution using I/sig as low as 1.5 as long as the completeness is greater than 75%? Another way to put this.. if % completeness is the new criteria for choosing your resolution limit (instead of Rmerg or I/sig), then what %completeness is too low to be considered? Also, I am aware that Rmerg increases with redundancy, is it acceptable to report Rmerg (or Rsym) at 66% and 98% with redundancy at 3.8 and 2.4 for the highest resolution bin of these crystals? I appreciate any comments. -A