Re: [ccp4bb] Concerns about statistics

2013-06-15 Thread Ed Pozharski

On 06/14/2013 07:00 AM, John R Helliwell wrote:
Alternatively, at poorer resolutions than that, you can monitor if the 
Cruickshank-Blow Diffraction Precision Index (DPI) improves or not as 
more data are steadily added to your model refinements.

Dear John,

unfortunately the behavior of DPIfree is less than satisfactory here - 
in a couple of cases I looked at it just steadily improves with 
resolution.  Example I have in front of me right now takes resolution 
down from 2.0A to 1.55A, and DPIfree goes down from ~0.17A to 0.09A at 
almost constant pace (slows down from 0.021 A/0.1A to 0.017 A/0.1A 
around 1.75A).


Notice that in this specific case I/sigI at 1.55A is ~0.4 and 
CC(1/2)~0.012 (even this non-repentant big-endian couldn't argue there 
is good signal there).


DPIfree is essentially proportional to Rfree * d^(2.5)  (this is 
assuming that No~1/d^3, Na and completeness do not change).  To keep up 
with resolution changes, Rfree would have to go up ~1.9 times, and 
obviously that is not going to happen no matter how much weak data I 
throw in.


The maximum-likelihood e.s.u. reported by Refmac makes more sense in 
this particular case as it clearly slows down big time around 1.77A (see 
https://plus.google.com/photos/113111298819619451614/albums/5889708830403779217). 
Coincidentally, Rfree also starts going up rapidly around the same 
resolution.  If anyone is curious what's I/sigI is at the breaking 
point it's ~1.5 and CC(1/2)~0.6.  And to bash Rmerge a little more, 
it's 112%.


So there are two questions I am very much interested in here.

a) Why is DPIfree so bad at this?  Can we even believe it given it's 
erratic behavior in this scenario?


b) I would normally set up a simple data mining project to see how 
common this ML_esu behavior is, but there is no easily accessible source 
of data processed to beyond I/sigI=2, let alone I/sigI=1 (are structural 
genomics folks reading this and do they maybe have such data to mine?).  
I can look into all of my own datasets, but that would be a biased 
selection of several crystal forms.  Perhaps others have looked into 
this too, and what are your observations? Or maybe you have a dataset 
processed way beyond I/sigI=1 and are willing to either share it with me 
together with a final model or run refinement at a bunch of different 
resolutions and report the result (I can provide bash scripts as needed).


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] Concerns about statistics

2013-06-15 Thread Jrh
Dear Ed,
Thankyou for this.
Indeed I have not pushed into the domain of I/sigI as low as 0.4 or CC1/2 as 
low as  0.012. 
So, I do not have an answer to your query at these extremes.
But I concede I am duly corrected by your example and indeed my email did not 
tabulate specifically how far one could investigate the plateau of DPI and 
certainly I was not considering such an extreme as you have investigated.
Best wishes,
Yours sincerely,
John

Prof John R Helliwell DSc
 
 

On 15 Jun 2013, at 15:31, Ed Pozharski epozh...@umaryland.edu wrote:

 On 06/14/2013 07:00 AM, John R Helliwell wrote:
 Alternatively, at poorer resolutions than that, you can monitor if the 
 Cruickshank-Blow Diffraction Precision Index (DPI) improves or not as more 
 data are steadily added to your model refinements.
 Dear John,
 
 unfortunately the behavior of DPIfree is less than satisfactory here - in a 
 couple of cases I looked at it just steadily improves with resolution.  
 Example I have in front of me right now takes resolution down from 2.0A to 
 1.55A, and DPIfree goes down from ~0.17A to 0.09A at almost constant pace 
 (slows down from 0.021 A/0.1A to 0.017 A/0.1A around 1.75A).
 
 Notice that in this specific case I/sigI at 1.55A is ~0.4 and CC(1/2)~0.012 
 (even this non-repentant big-endian couldn't argue there is good signal 
 there).
 
 DPIfree is essentially proportional to Rfree * d^(2.5)  (this is assuming 
 that No~1/d^3, Na and completeness do not change).  To keep up with 
 resolution changes, Rfree would have to go up ~1.9 times, and obviously that 
 is not going to happen no matter how much weak data I throw in.
 
 The maximum-likelihood e.s.u. reported by Refmac makes more sense in this 
 particular case as it clearly slows down big time around 1.77A (see 
 https://plus.google.com/photos/113111298819619451614/albums/5889708830403779217).
  Coincidentally, Rfree also starts going up rapidly around the same 
 resolution.  If anyone is curious what's I/sigI is at the breaking point 
 it's ~1.5 and CC(1/2)~0.6.  And to bash Rmerge a little more, it's 112%.
 
 So there are two questions I am very much interested in here.
 
 a) Why is DPIfree so bad at this?  Can we even believe it given it's erratic 
 behavior in this scenario?
 
 b) I would normally set up a simple data mining project to see how common 
 this ML_esu behavior is, but there is no easily accessible source of data 
 processed to beyond I/sigI=2, let alone I/sigI=1 (are structural genomics 
 folks reading this and do they maybe have such data to mine?).  I can look 
 into all of my own datasets, but that would be a biased selection of several 
 crystal forms.  Perhaps others have looked into this too, and what are your 
 observations? Or maybe you have a dataset processed way beyond I/sigI=1 and 
 are willing to either share it with me together with a final model or run 
 refinement at a bunch of different resolutions and report the result (I can 
 provide bash scripts as needed).
 
 Cheers,
 
 Ed.
 
 -- 
 Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs
 


Re: [ccp4bb] Concerns about statistics

2013-06-14 Thread Dirk Kostrewa

Dear Andrea,

I agree with Tim and still cut the resolution at I/sigma=2. In my 
experience, including higher resolution shells with poorer 
signal-to-noise never changed the apparent resolution of the electron 
density maps.
In addition, the high resolution limit at I/sigma=2 coincides very 
well with the point where the Fo vs. Fo +Gauss(0,1)*sigma(Fo) 
correlation coefficient curve, reported by BUSTER, crosses the 
recommended lower limit of 0.9.


And please note, CC*=0.5 corresponds to CC(1/2)=0.143. In my very 
limited experience, I/sigma=2 corresponds to roughly CC(1/2)~0.7.


Although I'm very excited about the CC(1/2) or CC* paper by Karplus  
Diederichs, I still prefer to be on the save side, until it has been 
verified in numerous cases, that choosing high resolution cutoffs based 
on CC(1/2) really leads to higher resolution structures. The recommended 
procedure to include small resolution increments in refinement to decide 
the high resolution cutoff is very time-consuming.


Best regards,

Dirk.


Am 13.06.13 17:15, schrieb Andrea Edwards:

Hello group,
I have some rather (embarrassingly) basic questions to ask. Mainly.. when 
deciding the resolution limit, which statistics are the most important? I have 
always been taught that the highest resolution bin should be chosen with I/sig 
no less than 2.0, Rmerg no less than 40%, and %Completeness should be as high 
as possible. However, I am currently encountered with a set of statistics that 
are clearly outside this criteria. Is it acceptable cut off resolution using 
I/sig as low as 1.5 as long as the completeness is greater than 75%? Another 
way to put this.. if % completeness is the new criteria for choosing your 
resolution limit (instead of Rmerg or I/sig), then what %completeness is too 
low to be considered? Also, I am aware that Rmerg increases with redundancy, is 
it acceptable to report Rmerg (or Rsym) at 66% and 98% with redundancy at 3.8 
and 2.4 for the highest resolution bin of these crystals? I appreciate any 
comments.
-A


--

***
Dirk Kostrewa
Gene Center Munich
Department of Biochemistry
Ludwig-Maximilians-Universität München
Feodor-Lynen-Str. 25
D-81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:+49-89-2180-76999
E-mail: kostr...@genzentrum.lmu.de
WWW:www.genzentrum.lmu.de
***


Re: [ccp4bb] Concerns about statistics

2013-06-14 Thread Tim Gruene
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/14/2013 11:43 AM, Dirk Kostrewa wrote:
 [...] The recommended procedure to include small resolution
 increments in refinement to decide the high resolution cutoff is
 very time-consuming.
... and very subjective: noise can look very unnoisy if you know
what you are looking for!

Best,
Tim


 Am 13.06.13 17:15, schrieb Andrea Edwards:
 Hello group, I have some rather (embarrassingly) basic questions
 to ask. Mainly.. when deciding the resolution limit, which
 statistics are the most important? I have always been taught that
 the highest resolution bin should be chosen with I/sig no less
 than 2.0, Rmerg no less than 40%, and %Completeness should be as
 high as possible. However, I am currently encountered with a set
 of statistics that are clearly outside this criteria. Is it
 acceptable cut off resolution using I/sig as low as 1.5 as long
 as the completeness is greater than 75%? Another way to put
 this.. if % completeness is the new criteria for choosing your
 resolution limit (instead of Rmerg or I/sig), then what 
 %completeness is too low to be considered? Also, I am aware that
 Rmerg increases with redundancy, is it acceptable to report Rmerg
 (or Rsym) at 66% and 98% with redundancy at 3.8 and 2.4 for the
 highest resolution bin of these crystals? I appreciate any
 comments. -A
 

- -- 
- --
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFRuuj8UxlJ7aRr7hoRAqsUAKDzjvv7cwsdqr3r3sPWy5efUQrpTwCgyC+k
K9UZDrAIwwRN01kAF+dKCGw=
=HbRH
-END PGP SIGNATURE-


Re: [ccp4bb] Concerns about statistics

2013-06-14 Thread Steiner, Roberto
BTW there's a also an earlier paper (properly cited in Karplus  Diederichs 
2012) showing the benefit of weak 'high-resolution' reflections.

Acta Crystallogr D Biol Crystallogr. 2010 Sep;66(Pt 9):988-1000. doi: 
10.1107/S0907444910029938. Epub 2010 Aug 13.
Inclusion of weak high-resolution X-ray data for improvement of a group II 
intron structure.
Wang J.
Department of Molecular Biophysics and Biochemistry, Yale University, New 
Haven, CT 06520, USA. jimin.w...@yale.edu

Abstract
It is common to report the resolution of a macromolecular structure with the 
highest resolution shell having an averaged I/sigma(I)  or = 2. Data beyond 
the resolution thus defined are weak and often poorly measured. The exclusion 
of these weak data may improve the apparent statistics and also leads to claims 
of lower resolutions that give some leniency in the acceptable quality of 
refined models. However, the inclusion of these data can provide additional 
strong constraints on atomic models during structure refinement and thus help 
to correct errors in the original models, as has recently been demonstrated for 
a protein structure. Here, an improved group II intron structure is reported 
arising from the inclusion of these data, which helped to define more accurate 
solvent models for density modification during experimental phasing steps. With 
the improved resolution and accuracy of the experimental phases, extensive 
revisions were made to the original models such that the correct tertiary 
interactions of the group II intron that are essential for understanding the 
chemistry of this ribozyme could be described.

Best wishes
Roberto

On 14 Jun 2013, at 10:43, Dirk Kostrewa kostr...@genzentrum.lmu.de
 wrote:

 Dear Andrea,
 
 I agree with Tim and still cut the resolution at I/sigma=2. In my 
 experience, including higher resolution shells with poorer signal-to-noise 
 never changed the apparent resolution of the electron density maps.
 In addition, the high resolution limit at I/sigma=2 coincides very well 
 with the point where the Fo vs. Fo +Gauss(0,1)*sigma(Fo) correlation 
 coefficient curve, reported by BUSTER, crosses the recommended lower limit of 
 0.9.
 
 And please note, CC*=0.5 corresponds to CC(1/2)=0.143. In my very limited 
 experience, I/sigma=2 corresponds to roughly CC(1/2)~0.7.
 
 Although I'm very excited about the CC(1/2) or CC* paper by Karplus  
 Diederichs, I still prefer to be on the save side, until it has been verified 
 in numerous cases, that choosing high resolution cutoffs based on CC(1/2) 
 really leads to higher resolution structures. The recommended procedure to 
 include small resolution increments in refinement to decide the high 
 resolution cutoff is very time-consuming.
 
 Best regards,
 
 Dirk.
 
 
 Am 13.06.13 17:15, schrieb Andrea Edwards:
 Hello group,
 I have some rather (embarrassingly) basic questions to ask. Mainly.. when 
 deciding the resolution limit, which statistics are the most important? I 
 have always been taught that the highest resolution bin should be chosen 
 with I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness 
 should be as high as possible. However, I am currently encountered with a 
 set of statistics that are clearly outside this criteria. Is it acceptable 
 cut off resolution using I/sig as low as 1.5 as long as the completeness is 
 greater than 75%? Another way to put this.. if % completeness is the new 
 criteria for choosing your resolution limit (instead of Rmerg or I/sig), 
 then what %completeness is too low to be considered? Also, I am aware that 
 Rmerg increases with redundancy, is it acceptable to report Rmerg (or Rsym) 
 at 66% and 98% with redundancy at 3.8 and 2.4 for the highest resolution bin 
 of these crystals? I appreciate any comments.
 -A
 
 -- 
 
 ***
 Dirk Kostrewa
 Gene Center Munich
 Department of Biochemistry
 Ludwig-Maximilians-Universität München
 Feodor-Lynen-Str. 25
 D-81377 Munich
 Germany
 Phone:+49-89-2180-76845
 Fax:  +49-89-2180-76999
 E-mail:   kostr...@genzentrum.lmu.de
 WWW:  www.genzentrum.lmu.de
 ***
 

Roberto A. Steiner
Group Leader
Randall Division of Cell and Molecular Biophysics
King's College London
roberto.stei...@kcl.ac.uk

Room 3.10A
New Hunt's House
Guy's Campus
SE1 1UL
London

Phone 0044 20 78488216
Fax0044 20 78486435


Re: [ccp4bb] Concerns about statistics

2013-06-14 Thread John R Helliwell
Dear Andrea,
Checking the quality of electron density maps has been correctly mentioned
as one adds more data.

In chemical crystallography one can monitor the bond distance and angles
sigmas ie until adding more data at ever higher resolution causes them to
deteriorate in quality.

The equivalent in protein crystallography can be done if one has sufficient
resolution for the least squares model refinement full matrix inversion to
work. Alternatively, at poorer resolutions than that, you can monitor if
the Cruickshank-Blow Diffraction Precision Index (DPI) improves or not as
more data are steadily added to your model refinements.

To quote the resolution in your article at which I/sigI crosses 2, whilst
not cutting the data at that point, can assist the reader in knowing the
data quality you have worked with. Making the raw diffraction data images
available would make things totally clear to the reader.

Best wishes,
John
Professor John R Helliwell DSc

On Thu, Jun 13, 2013 at 4:15 PM, Andrea Edwards edwar...@stanford.eduwrote:

 Hello group,
 I have some rather (embarrassingly) basic questions to ask. Mainly.. when
 deciding the resolution limit, which statistics are the most important? I
 have always been taught that the highest resolution bin should be chosen
 with I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness
 should be as high as possible. However, I am currently encountered with a
 set of statistics that are clearly outside this criteria. Is it acceptable
 cut off resolution using I/sig as low as 1.5 as long as the completeness is
 greater than 75%? Another way to put this.. if % completeness is the new
 criteria for choosing your resolution limit (instead of Rmerg or I/sig),
 then what %completeness is too low to be considered? Also, I am aware that
 Rmerg increases with redundancy, is it acceptable to report Rmerg (or Rsym)
 at 66% and 98% with redundancy at 3.8 and 2.4 for the highest resolution
 bin of these crystals? I appreciate any comments.
 -A




--


Re: [ccp4bb] Concerns about statistics

2013-06-14 Thread Boaz Shaanan
In their paper K  D monitored the electron  density for their coffactor and 
could verify that adding higher resolution shells based on the CC1/2 statistics 
improved the way it looked. I'm not sure they monitored bond-distances and/or 
esd's but those may well have been affected by restraints and weights anyway, 
in the reoslution they worked (~1.45 A?). It might be more difficult to judge 
the effect of including higher resolution shells if there isn't a feature that 
is easy to monitor as you increase the resolution. In one of the cases that I'm 
working on I certainly noticed better geometry and e.d. for the co-factor upon 
adding (somewhat) higher resolution shells. 

 Cheers,

   Boaz

Boaz Shaanan, Ph.D.
Dept. of Life Sciences
Ben-Gurion University of the Negev
Beer-Sheva 84105
Israel

E-mail: bshaa...@bgu.ac.il
Phone: 972-8-647-2220  Skype: boaz.shaanan
Fax:   972-8-647-2992 or 972-8-646-1710






From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Steiner, Roberto 
[roberto.stei...@kcl.ac.uk]
Sent: Friday, June 14, 2013 12:58 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Concerns about statistics

BTW there's a also an earlier paper (properly cited in Karplus  Diederichs 
2012) showing the benefit of weak 'high-resolution' reflections.

Acta Crystallogr D Biol Crystallogr. 2010 Sep;66(Pt 9):988-1000. doi: 
10.1107/S0907444910029938. Epub 2010 Aug 13.
Inclusion of weak high-resolution X-ray data for improvement of a group II 
intron structure.
Wang J.
Department of Molecular Biophysics and Biochemistry, Yale University, New 
Haven, CT 06520, USA. jimin.w...@yale.edu

Abstract
It is common to report the resolution of a macromolecular structure with the 
highest resolution shell having an averaged I/sigma(I)  or = 2. Data beyond 
the resolution thus defined are weak and often poorly measured. The exclusion 
of these weak data may improve the apparent statistics and also leads to claims 
of lower resolutions that give some leniency in the acceptable quality of 
refined models. However, the inclusion of these data can provide additional 
strong constraints on atomic models during structure refinement and thus help 
to correct errors in the original models, as has recently been demonstrated for 
a protein structure. Here, an improved group II intron structure is reported 
arising from the inclusion of these data, which helped to define more accurate 
solvent models for density modification during experimental phasing steps. With 
the improved resolution and accuracy of the experimental phases, extensive 
revisions were made to the original models such that the correct tertiary 
interactions of the group II intron that are essential for understanding the 
chemistry of this ribozyme could be described.

Best wishes
Roberto

On 14 Jun 2013, at 10:43, Dirk Kostrewa kostr...@genzentrum.lmu.de
 wrote:

 Dear Andrea,

 I agree with Tim and still cut the resolution at I/sigma=2. In my 
 experience, including higher resolution shells with poorer signal-to-noise 
 never changed the apparent resolution of the electron density maps.
 In addition, the high resolution limit at I/sigma=2 coincides very well 
 with the point where the Fo vs. Fo +Gauss(0,1)*sigma(Fo) correlation 
 coefficient curve, reported by BUSTER, crosses the recommended lower limit of 
 0.9.

 And please note, CC*=0.5 corresponds to CC(1/2)=0.143. In my very limited 
 experience, I/sigma=2 corresponds to roughly CC(1/2)~0.7.

 Although I'm very excited about the CC(1/2) or CC* paper by Karplus  
 Diederichs, I still prefer to be on the save side, until it has been verified 
 in numerous cases, that choosing high resolution cutoffs based on CC(1/2) 
 really leads to higher resolution structures. The recommended procedure to 
 include small resolution increments in refinement to decide the high 
 resolution cutoff is very time-consuming.

 Best regards,

 Dirk.


 Am 13.06.13 17:15, schrieb Andrea Edwards:
 Hello group,
 I have some rather (embarrassingly) basic questions to ask. Mainly.. when 
 deciding the resolution limit, which statistics are the most important? I 
 have always been taught that the highest resolution bin should be chosen 
 with I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness 
 should be as high as possible. However, I am currently encountered with a 
 set of statistics that are clearly outside this criteria. Is it acceptable 
 cut off resolution using I/sig as low as 1.5 as long as the completeness is 
 greater than 75%? Another way to put this.. if % completeness is the new 
 criteria for choosing your resolution limit (instead of Rmerg or I/sig), 
 then what %completeness is too low to be considered? Also, I am aware that 
 Rmerg increases with redundancy, is it acceptable to report Rmerg (or Rsym) 
 at 66% and 98% with redundancy at 3.8 and 2.4 for the highest resolution bin 
 of these crystals? I appreciate any comments

Re: [ccp4bb] Concerns about statistics

2013-06-14 Thread Andrew Leslie
Hi Boaz,

The improvement you see in the cofactor geometry after inclusion of higher 
resolution data is very interesting, but is it possible that this is a 
secondary effect resulting from the additional Xray data changing the 
relative weighting of the Xray and stereochemical terms in the refinement ? 

Could you get a similar geometry improvement simply by changing the relative 
weighting (using only the original data)  or would this only be with a penalty 
in other statistics ?

Did you quantify the improvement in the cofactor density, for example by a 
correlation coefficient ?

Cheers,

Andrew


On 14 Jun 2013, at 13:45, Boaz Shaanan bshaa...@exchange.bgu.ac.il wrote:

 In their paper K  D monitored the electron  density for their coffactor and 
 could verify that adding higher resolution shells based on the CC1/2 
 statistics improved the way it looked. I'm not sure they monitored 
 bond-distances and/or esd's but those may well have been affected by 
 restraints and weights anyway, in the reoslution they worked (~1.45 A?). It 
 might be more difficult to judge the effect of including higher resolution 
 shells if there isn't a feature that is easy to monitor as you increase the 
 resolution. In one of the cases that I'm working on I certainly noticed 
 better geometry and e.d. for the co-factor upon adding (somewhat) higher 
 resolution shells. 
 
 Cheers,
 
   Boaz
 
 Boaz Shaanan, Ph.D.
 Dept. of Life Sciences
 Ben-Gurion University of the Negev
 Beer-Sheva 84105
 Israel
 
 E-mail: bshaa...@bgu.ac.il
 Phone: 972-8-647-2220  Skype: boaz.shaanan
 Fax:   972-8-647-2992 or 972-8-646-1710
 
 
 
 
 
 
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Steiner, 
 Roberto [roberto.stei...@kcl.ac.uk]
 Sent: Friday, June 14, 2013 12:58 PM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] Concerns about statistics
 
 BTW there's a also an earlier paper (properly cited in Karplus  Diederichs 
 2012) showing the benefit of weak 'high-resolution' reflections.
 
 Acta Crystallogr D Biol Crystallogr. 2010 Sep;66(Pt 9):988-1000. doi: 
 10.1107/S0907444910029938. Epub 2010 Aug 13.
 Inclusion of weak high-resolution X-ray data for improvement of a group II 
 intron structure.
 Wang J.
 Department of Molecular Biophysics and Biochemistry, Yale University, New 
 Haven, CT 06520, USA. jimin.w...@yale.edu
 
 Abstract
 It is common to report the resolution of a macromolecular structure with the 
 highest resolution shell having an averaged I/sigma(I)  or = 2. Data beyond 
 the resolution thus defined are weak and often poorly measured. The exclusion 
 of these weak data may improve the apparent statistics and also leads to 
 claims of lower resolutions that give some leniency in the acceptable quality 
 of refined models. However, the inclusion of these data can provide 
 additional strong constraints on atomic models during structure refinement 
 and thus help to correct errors in the original models, as has recently been 
 demonstrated for a protein structure. Here, an improved group II intron 
 structure is reported arising from the inclusion of these data, which helped 
 to define more accurate solvent models for density modification during 
 experimental phasing steps. With the improved resolution and accuracy of the 
 experimental phases, extensive revisions were made to the original models 
 such that the correct tertiary interactions of the group II intron that are 
 essential for understanding the chemistry of this ribozyme could be described.
 
 Best wishes
 Roberto
 
 On 14 Jun 2013, at 10:43, Dirk Kostrewa kostr...@genzentrum.lmu.de
 wrote:
 
 Dear Andrea,
 
 I agree with Tim and still cut the resolution at I/sigma=2. In my 
 experience, including higher resolution shells with poorer signal-to-noise 
 never changed the apparent resolution of the electron density maps.
 In addition, the high resolution limit at I/sigma=2 coincides very well 
 with the point where the Fo vs. Fo +Gauss(0,1)*sigma(Fo) correlation 
 coefficient curve, reported by BUSTER, crosses the recommended lower limit 
 of 0.9.
 
 And please note, CC*=0.5 corresponds to CC(1/2)=0.143. In my very limited 
 experience, I/sigma=2 corresponds to roughly CC(1/2)~0.7.
 
 Although I'm very excited about the CC(1/2) or CC* paper by Karplus  
 Diederichs, I still prefer to be on the save side, until it has been 
 verified in numerous cases, that choosing high resolution cutoffs based on 
 CC(1/2) really leads to higher resolution structures. The recommended 
 procedure to include small resolution increments in refinement to decide the 
 high resolution cutoff is very time-consuming.
 
 Best regards,
 
 Dirk.
 
 
 Am 13.06.13 17:15, schrieb Andrea Edwards:
 Hello group,
 I have some rather (embarrassingly) basic questions to ask. Mainly.. when 
 deciding the resolution limit, which statistics are the most important? I 
 have always been taught that the highest resolution bin should

Re: [ccp4bb] Concerns about statistics

2013-06-13 Thread Nat Echols
On Thu, Jun 13, 2013 at 8:15 AM, Andrea Edwards edwar...@stanford.eduwrote:

 I have some rather (embarrassingly) basic questions to ask. Mainly.. when
 deciding the resolution limit, which statistics are the most important? I
 have always been taught that the highest resolution bin should be chosen
 with I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness
 should be as high as possible. However, I am currently encountered with a
 set of statistics that are clearly outside this criteria. Is it acceptable
 cut off resolution using I/sig as low as 1.5 as long as the completeness is
 greater than 75%? Another way to put this.. if % completeness is the new
 criteria for choosing your resolution limit (instead of Rmerg or I/sig),
 then what %completeness is too low to be considered? Also, I am aware that
 Rmerg increases with redundancy, is it acceptable to report Rmerg (or Rsym)
 at 66% and 98% with redundancy at 3.8 and 2.4 for the highest resolution
 bin of these crystals? I appreciate any comments.


A (probably) better way:

http://www.ncbi.nlm.nih.gov/pubmed/22628654

Short version: don't try to use simplistic rules, instead use all data
that actually improve the model.  In practice, what I've noticed in some
recent articles is (paraphrasing) data extend to 2.5Å with an I/sigma of 2
in the highest-resolution shell, but we used data to 2.2Å as suggested by
Karplus  Diederichs.  This allows you to actually use as much data as
possible while still (hopefully) pleasing any pedantic reviewers.
(Substitute 90% completeness or R-merge of whatever for the I/sigma cutoff
if you prefer, the end result will still be the same.)

-Nat


Re: [ccp4bb] Concerns about statistics

2013-06-13 Thread Andrea Edwards
In this case, the author should report a correlation coefficient along with the 
other standard statistics (I/sigI, Rmerg, Completeness, redundancy, ect.)? What 
about Rpim instead of Rmerg? and if Rpim is reported, what should be the 
criteria for resolution cutoff?

Also, if this paper is the new standard how should we regard statistic 
reported in the literature? Or.. more importantly, how do we go about reviewing 
current literature that does not report this statistic?

- Original Message -
From: Klaus Fütterer k.futte...@bham.ac.uk
To: Andrea Edwards edwar...@stanford.edu
Sent: Thursday, June 13, 2013 8:27:33 AM
Subject: Re: [ccp4bb] Concerns about statistics

The commonly accepted answer is in 
Linking crystallographic model and data quality. 


Karplus PA, Diederichs K. 

Science . 2012 May 25;336(6084):1030-3. doi: 10.1126/science.1218231. 


Best wishes, 


Klaus Fütterer 



=== 

Dr. Klaus Fütterer 
Deputy Head of School 
Undergraduate Admissions 
Room 717, Biosciences Tower 

School of Biosciences P: +44-(0)-121-414 5895 
University of Birmingham F: +44-(0)-121-414 5925 
Edgbaston E: k.futte...@bham.ac.uk 
Birmingham, B15 2TT, UK W: http://tinyurl.com/futterer-lab 
=== 







On 13 Jun 2013, at 16:15, Andrea Edwards wrote: 



Hello group, 
I have some rather (embarrassingly) basic questions to ask. Mainly.. when 
deciding the resolution limit, which statistics are the most important? I have 
always been taught that the highest resolution bin should be chosen with I/sig 
no less than 2.0, Rmerg no less than 40%, and %Completeness should be as high 
as possible. However, I am currently encountered with a set of statistics that 
are clearly outside this criteria. Is it acceptable cut off resolution using 
I/sig as low as 1.5 as long as the completeness is greater than 75%? Another 
way to put this.. if % completeness is the new criteria for choosing your 
resolution limit (instead of Rmerg or I/sig), then what %completeness is too 
low to be considered? Also, I am aware that Rmerg increases with redundancy, is 
it acceptable to report Rmerg (or Rsym) at 66% and 98% with redundancy at 3.8 
and 2.4 for the highest resolution bin of these crystals? I appreciate any 
comments. 
-A 


Re: [ccp4bb] Concerns about statistics

2013-06-13 Thread Andrea Edwards
..and Rmerg seems to be meaningless for judging data quality?

- Original Message -
From: Klaus Fütterer k.futte...@bham.ac.uk
To: Andrea Edwards edwar...@stanford.edu
Sent: Thursday, June 13, 2013 8:49:13 AM
Subject: Re: [ccp4bb] Concerns about statistics

Seems you are reviewing a paper at present. If this is indeed the case, it is 
fair to ask the authors to supply CC1/2 for their data or to rationalise their 
hi-res cut-off in light of that stats. For older papers, you can't obviously do 
that. As always in stats, there is no sharp line. 


My personal take is: I/sig  1.5 in the high res shell with at least 85% 
completeness (at that cut-off). 15 years ago, I would have said I/sigI  3 with 
at least 75% completeness, from which you can see how arbitrary the figures 
are. The merit of the Karplus  Diederichs paper is to demonstrate changes in 
the electron density map in relation to cut-offs. 


Klaus 







=== 

Dr. Klaus Fütterer 
Deputy Head of School 
Undergraduate Admissions 
Room 717, Biosciences Tower 

School of Biosciences P: +44-(0)-121-414 5895 
University of Birmingham F: +44-(0)-121-414 5925 
Edgbaston E: k.futte...@bham.ac.uk 
Birmingham, B15 2TT, UK W: http://tinyurl.com/futterer-lab 
=== 







On 13 Jun 2013, at 16:44, Andrea Edwards wrote: 



In this case, the author should report a correlation coefficient along with the 
other standard statistics (I/sigI, Rmerg, Completeness, redundancy, ect.)? What 
about Rpim instead of Rmerg? and if Rpim is reported, what should be the 
criteria for resolution cutoff? 

Also, if this paper is the new standard how should we regard statistic 
reported in the literature? Or.. more importantly, how do we go about reviewing 
current literature that does not report this statistic? 

- Original Message - 
From: Klaus Fütterer  k.futte...@bham.ac.uk  
To: Andrea Edwards  edwar...@stanford.edu  
Sent: Thursday, June 13, 2013 8:27:33 AM 
Subject: Re: [ccp4bb] Concerns about statistics 

The commonly accepted answer is in 
Linking crystallographic model and data quality. 


Karplus PA, Diederichs K. 

Science . 2012 May 25;336(6084):1030-3. doi: 10.1126/science.1218231. 


Best wishes, 


Klaus Fütterer 



=== 

Dr. Klaus Fütterer 
Deputy Head of School 
Undergraduate Admissions 
Room 717, Biosciences Tower 

School of Biosciences P: +44-(0)-121-414 5895 
University of Birmingham F: +44-(0)-121-414 5925 
Edgbaston E: k.futte...@bham.ac.uk 
Birmingham, B15 2TT, UK W: http://tinyurl.com/futterer-lab 
=== 







On 13 Jun 2013, at 16:15, Andrea Edwards wrote: 



Hello group, 
I have some rather (embarrassingly) basic questions to ask. Mainly.. when 
deciding the resolution limit, which statistics are the most important? I have 
always been taught that the highest resolution bin should be chosen with I/sig 
no less than 2.0, Rmerg no less than 40%, and %Completeness should be as high 
as possible. However, I am currently encountered with a set of statistics that 
are clearly outside this criteria. Is it acceptable cut off resolution using 
I/sig as low as 1.5 as long as the completeness is greater than 75%? Another 
way to put this.. if % completeness is the new criteria for choosing your 
resolution limit (instead of Rmerg or I/sig), then what %completeness is too 
low to be considered? Also, I am aware that Rmerg increases with redundancy, is 
it acceptable to report Rmerg (or Rsym) at 66% and 98% with redundancy at 3.8 
and 2.4 for the highest resolution bin of these crystals? I appreciate any 
comments. 
-A 


Re: [ccp4bb] Concerns about statistics

2013-06-13 Thread Tim Gruene
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear Andrea,

unless you are desperately longing for resolution, I normally cut the
resolution where I/sigI  2.0. You should, however, make up your own
rules as to how to determine I/sigI: it must be computed in resolution
shells, and if you choose the shells wide enough the strong data might
cover up for the weak ones. I usually run XSCALE or xprep and use their
preset resolution shells (e.g. the shells in CORRECT.LP are wider than
those in XSCALE).

Rmerge is obsolete, and I always encourage people to use Rmeas (aka
Rrim, not Rpim!) instead. Publishing Rmerge is a little bit like saying,
we have always gone for lunch at 1pm, so we will stick to that - it's a
habit despite better knowledge.

I do not consider Rmerge/Rmeas in order to descide about the
resolution cut-off and I have used data with Rmeas  200% provided
I/sigI  2.0

The completeness does not really say much in terms of data quality:
with a little tweaking of the parameters, most integration programs
would give you about 99.9% completeness even if their is mostly noise
on the detector!

The method suggested by Karplus and Diederichs works well, i.e.
checking various resolution ranges, e.g. in steps of 0.2A, and looking
at the maps - but it is time consuming and I would only apply it at
low resolution where +/- 0.1A can make quite a difference (actually in
both directions: if you include too much noise, the maps become more
difficult to interpret).

Best,
Tim

P.S.: Your question is not embarrassing, it is an ongoing discussion
with no definite answer.

On 06/13/2013 05:15 PM, Andrea Edwards wrote:
 Hello group, I have some rather (embarrassingly) basic questions
 to ask. Mainly.. when deciding the resolution limit, which
 statistics are the most important? I have always been taught that
 the highest resolution bin should be chosen with I/sig no less than
 2.0, Rmerg no less than 40%, and %Completeness should be as high as
 possible. However, I am currently encountered with a set of
 statistics that are clearly outside this criteria. Is it acceptable
 cut off resolution using I/sig as low as 1.5 as long as the
 completeness is greater than 75%? Another way to put this.. if %
 completeness is the new criteria for choosing your resolution limit
 (instead of Rmerg or I/sig), then what %completeness is too low to
 be considered? Also, I am aware that Rmerg increases with
 redundancy, is it acceptable to report Rmerg (or Rsym) at 66% and
 98% with redundancy at 3.8 and 2.4 for the highest resolution bin
 of these crystals? I appreciate any comments. -A
 

- -- 
- --
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFRue6rUxlJ7aRr7hoRAleBAKDnYEA+eMoGoyS6vWD6HfU+XG+fhwCgqMG2
F2AnLH6B1rPTbqeJ9wM9FjE=
=EoSr
-END PGP SIGNATURE-


Re: [ccp4bb] Concerns about statistics

2013-06-13 Thread Ed Pozharski
On Thu, 2013-06-13 at 08:44 -0700, Andrea Edwards wrote:
 In this case, the author should report a correlation coefficient along
 with the other standard statistics (I/sigI, Rmerg, Completeness,
 redundancy, ect.)? 

Won't hurt.  

 What about Rpim instead of Rmerg? and if Rpim is reported, what should
 be the criteria for resolution cutoff?

Rmerge is known to be deeply flawed for ~15 years.  IMHO, it shall not
be reported at all.  While Rpim is better, the whole point of
KarplusDiederichs is that R-type measures are not very useful in
deciding resolution cutoff.

 Also, if this paper is the new standard how should we regard
 statistic reported in the literature? 

We should keep in mind that conservative resolution cutoff criteria has
been used in the field for decades. 

 Or.. more importantly, how do we go about reviewing current literature
 that does not report this statistic?

Structures refined up to I/sigma=2 should be considered likely to have
been refined to resolution that was cut off too low.

With that said, I am pretty sure that in vast majority of cases
structural conclusions derived with I/s=2 vs CC1/2=0.5 vs DR=0 cutoff
will be essentially the same.
 

-- 
I'd jump in myself, if I weren't so good at whistling.
   Julian, King of Lemurs


Re: [ccp4bb] Concerns about statistics

2013-06-13 Thread Tim Gruene
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



On 06/13/2013 06:16 PM, Ed Pozharski wrote:
 [...] With that said, I am pretty sure that in vast majority of
 cases structural conclusions derived with I/s=2 vs CC1/2=0.5 vs
 DR=0 cutoff will be essentially the same.

Hi Ed,
in my experience, CC(1/2)  0.7 corresponds quite well to I/sigI  2.0
rather than CC(1/2)  0.5 (again, with the default resolution shells
from xprep that also plots CC(1/2) vs. resolution. Are above numbers
based on experience, too? If so, which program do you usually use to
look at these statistics?

Tim


- -- 
- --
Dr Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFRufFfUxlJ7aRr7hoRAk5UAKCHC1IpbdXmQE/5e1eRD+SON+jarQCg0b3m
JCduhOJnVhczCUy+qv9n3Oc=
=BY32
-END PGP SIGNATURE-


Re: [ccp4bb] Concerns about statistics

2013-06-13 Thread Ed Pozharski
Tim,

my personal preference always was I/sigI=1.  In my Scalepack days, I
always noticed that ~30% of the reflections in the I/sigI=1 shells had
I/sigI2, and formed an unverified belief that there should be some
information there.

In my experience, CC1/2=0.5 would normally yield I/sigI~1, not 2.  This
is based predominantly on Scala/Aimless.

Cheers,

Ed.

On Thu, 2013-06-13 at 18:20 +0200, Tim Gruene wrote:
 
 On 06/13/2013 06:16 PM, Ed Pozharski wrote:
  [...] With that said, I am pretty sure that in vast majority of
  cases structural conclusions derived with I/s=2 vs CC1/2=0.5 vs
  DR=0 cutoff will be essentially the same.
 
 Hi Ed,
 in my experience, CC(1/2)  0.7 corresponds quite well to I/sigI  2.0
 rather than CC(1/2)  0.5 (again, with the default resolution shells
 from xprep that also plots CC(1/2) vs. resolution. Are above numbers
 based on experience, too? If so, which program do you usually use to
 look at these statistics?
 
 Tim
 
 

-- 
I don't know why the sacrifice thing didn't work.  
Science behind it seemed so solid.
Julian, King of Lemurs


Re: [ccp4bb] Concerns about statistics

2013-06-13 Thread Robbie Joosten
Hi Andrea,

Any choice you make about a resolution cut-off based on a rule of thumb can be 
called into question by a referee who uses a different rule of thumb. So if you 
choose a metric + cut-off that is anything less than very conservative (say 
I/sigI  1), you have to be able to defend your choice either with a reference 
or with evidence from experiments. This is where the 'paired refinement' of the 
Karplus and Diederichs paper kicks in: you can show that you can get useful 
information out of the extra high resolution reflections by comparing 
refinement results.

So what you can do is first solve your structure, build and refine using a 
conservative resolution cut-off. Once you are nearing the final stages of the 
process you can gradually go for higher resolutions using the paired refinement 
procedure. That way you have some results to support you choice of resolution 
cut-off. Who knows, when you reach the best resolution cut-off you may be able 
to add some more details to your structure model, that you would have missed 
otherwise.

If you think that doing the paired refinement is too much work, you can try 
PDB_REDO. If you give it a PDB file with a resolution cut-off in REMARK 2 or 3 
lower than the maximal resolution of your reflection file, it will 
automatically use paired refinement to find the best resolution cut-off (yes, 
this is a self-plug!).

HTH,
Robbie Joosten

 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
 Andrea Edwards
 Sent: Thursday, June 13, 2013 17:15
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: [ccp4bb] Concerns about statistics
 
 Hello group,
 I have some rather (embarrassingly) basic questions to ask. Mainly.. when
 deciding the resolution limit, which statistics are the most important? I have
 always been taught that the highest resolution bin should be chosen with
 I/sig no less than 2.0, Rmerg no less than 40%, and %Completeness should be
 as high as possible. However, I am currently encountered with a set of
 statistics that are clearly outside this criteria. Is it acceptable cut off 
 resolution
 using I/sig as low as 1.5 as long as the completeness is greater than 75%?
 Another way to put this.. if % completeness is the new criteria for choosing
 your resolution limit (instead of Rmerg or I/sig), then what %completeness is
 too low to be considered? Also, I am aware that Rmerg increases with
 redundancy, is it acceptable to report Rmerg (or Rsym) at 66% and 98% with
 redundancy at 3.8 and 2.4 for the highest resolution bin of these crystals? I
 appreciate any comments.
 -A