Re: [ccp4bb] Highest shell standards

2007-03-26 Thread Fred. Vellieux
On Fri, 23 Mar 2007, Edward Berry wrote:

 I believe fft does not by default do this- only if you use the fillin 
 keyword? One place where it might be important is in density 
 modification/ molecular averaging. Molecular averaging can be seen
 as a numerical solution of the MR equations, finding a density map
 which (A) obeys the NCS/intercrystal symmetry and (B) yields the 
 observed F's upon Fourier transformation.  Now if on each cycle
 you set the missing F's to zero, you are requiring it to have
 as part of (B) zero amplitude for the missing reflections, which is
 more restrictive and incorrect. If instead you allow the missing F's
 to float, calculating them on each cycle from the previous map
 using the fillin option, someone has shown (don't have the
 reference handy at the moment) that the F's tend toward the true F's
 (in the case that they weren't really missing but omitted as part
 of the test).
 
 Ed

You have phase scatter plots in Acta Cryst. D51, 575-589 (1995) that show
just that: the map inversion phases from NCS averaging tend toward the
true phases. Since F's are phased quantities and since phases are more
important than amplitudes, non random amplitudes plus non random phases
(both from map inversion of averaged maps) lead to better electron density
maps.

Fred.

-- 

 Fred. Vellieux, esq.

 =
 IBS J.-P. Ebel CEA CNRS UJF / LBM
 41 rue Jules Horowitz
 38027 Grenoble Cedex 01
 France
 Tel: (+33) (0) 438789605
 Fax: (+33) (0) 438785494
 =


Re: [ccp4bb] Highest shell standards

2007-03-26 Thread Michel Fodje
You are probably referring to the following works:

Caliandro et al, Acta Cryst. D61 (2005) 556-565
and Caliandro et al, Acta Cryst. D61 (2005) 1080-1087

in which they used density modification to calculate phases for
unmeasured reflections, and used the phases to extend the resolution by
calculating rough estimates unmeasured amplitudes. Using this technique
they actually could improve the electron density.

If I'm not mistaken, George Sheldrick has implemented this Free Lunch
algorithm in SHELXE.

/Michel

On Fri, 2007-03-23 at 08:05 -0800, Edward Berry wrote:
 If instead you allow the missing F's
 to float, calculating them on each cycle from the previous map
 using the fillin option, someone has shown (don't have the
 reference handy at the moment) that the F's tend toward the true F's
 (in the case that they weren't really missing but omitted as part
 of the test).
 
 Ed


Re: [ccp4bb] Highest shell standards

2007-03-26 Thread Edward A. Berry

Actually I was thinking of a somewhat earlier paper:

Rayment,I. Molecular relacement method at low resolution:
optimum strategy and intrinsic limitations as determined
by calculations on icosahedral virus models.
Acta Crystallogr. A 39, 102  116 (1983).

But thanks for bringing the Caliandro et al. paper to my attention.
Thanks also to Fred. Vellieux for his comments, and to Pete Dunton
for explaining to me that while fft doesn't do fillin by default,
the 2MFo-DFc map coefficients from refmac5 do have fillin values
for the missing reflection, making model bias a problem when
many missing residues are included.

Now I understand Petrus's question.

Ed

Michel Fodje wrote:

You are probably referring to the following works:

Caliandro et al, Acta Cryst. D61 (2005) 556-565
and Caliandro et al, Acta Cryst. D61 (2005) 1080-1087

in which they used density modification to calculate phases for
unmeasured reflections, and used the phases to extend the resolution by
calculating rough estimates unmeasured amplitudes. Using this technique
they actually could improve the electron density.

If I'm not mistaken, George Sheldrick has implemented this Free Lunch
algorithm in SHELXE.

/Michel

On Fri, 2007-03-23 at 08:05 -0800, Edward Berry wrote:


If instead you allow the missing F's
to float, calculating them on each cycle from the previous map
using the fillin option, someone has shown (don't have the
reference handy at the moment) that the F's tend toward the true F's
(in the case that they weren't really missing but omitted as part
of the test).

Ed


Re: [ccp4bb] Highest shell standards -oops-

2007-03-26 Thread Edward A. Berry

 -oops-
many missing REFLECTIONS are included.


Re: [ccp4bb] Highest shell standards

2007-03-26 Thread price
Isn't automatically included fabricated data for missing reflections 
a really bad idea for anisotropic data where most reflections are 
missing at high resolution?  Shouldn't there be a big flashing red 
flag alerting the user to what's been done?

Phoebe

At 01:22 PM 3/26/2007, Edward A. Berry wrote:

Actually I was thinking of a somewhat earlier paper:

Rayment,I. Molecular relacement method at low resolution:
optimum strategy and intrinsic limitations as determined
by calculations on icosahedral virus models.
Acta Crystallogr. A 39, 102  116 (1983).

But thanks for bringing the Caliandro et al. paper to my attention.
Thanks also to Fred. Vellieux for his comments, and to Pete Dunton
for explaining to me that while fft doesn't do fillin by default,
the 2MFo-DFc map coefficients from refmac5 do have fillin values
for the missing reflection, making model bias a problem when
many missing residues are included.

Now I understand Petrus's question.

Ed

Michel Fodje wrote:

You are probably referring to the following works:
Caliandro et al, Acta Cryst. D61 (2005) 556-565
and Caliandro et al, Acta Cryst. D61 (2005) 1080-1087
in which they used density modification to calculate phases for
unmeasured reflections, and used the phases to extend the resolution by
calculating rough estimates unmeasured amplitudes. Using this technique
they actually could improve the electron density.
If I'm not mistaken, George Sheldrick has implemented this Free Lunch
algorithm in SHELXE.
/Michel
On Fri, 2007-03-23 at 08:05 -0800, Edward Berry wrote:


If instead you allow the missing F's
to float, calculating them on each cycle from the previous map
using the fillin option, someone has shown (don't have the
reference handy at the moment) that the F's tend toward the true F's
(in the case that they weren't really missing but omitted as part
of the test).

Ed


---
Phoebe A. Rice
Assoc. Prof., Dept. of Biochemistry  Molecular Biology
The University of Chicago
phone 773 834 1723
fax 773 702 0439
http://bmb.bsd.uchicago.edu/index.html
http://www.nasa.gov/mission_pages/cassini/multimedia/pia06064.html 


Re: [ccp4bb] Highest shell standards

2007-03-24 Thread Poul Nissen
I very much agree with Holton. I also find the following to be a simple
and very helpful argument in the discussion (it was given to me by Morten
Kjeldgaard):

Your model should reproduce your weak data by weak Fc's - therefore you
need your  weak reflections in the refinement

I personally like to judge a proper data cutoff from the Wilson plot - as
long as it looks right data are OK to use. If on the other hand  the plot
changes slope, levels out or even rises at some point, then I cut the data
there.
The plot will often show nice appearance to a I/sigI level of about 1 -
1.5, other times 2 or 3 - it really depends on the crystal (and the
assumption of valid wilson statistics, of course).

Poul

 I generally cut off integration at the shell wheree I/sigI  0.5 and
 then cut off merged data where MnI/sd(I) ~ 1.5.  It is always easier to
 to cut off data later than to re-integrate it.  I  never look at the
 Rmerge, Rsym, Rpim or Rwhatever in the highest resolution shell.  This
 is because R-statistics are inappropriate for weak data.

 Don't believe me?  If anybody out there doesn't think that spots with no
 intensity are important, then why are you looking so carefully at your
 systematic absences? ;)  The Rabsent statistic (if it existed) would
 always be dividing by zero, and giving wildly varying numbers  100%
 (unless your absences really do have intensity, but then they are not
 absent, are they?).

   There is information in the intensity of an absent spot (a
 systematic absence, or any spot beyond your true resolution limit).
 Unfortunately, measuring zero is hard because the signal to noise
 ratio will always be ... zero.  Statistics as we know it seems to fear
 this noisesignal domain.  For example, the error propagation Ulrich
 pointed out (F/sigF = 2 I/sigI) breaks down as I approaches zero.  If
 you take F=0, and add random noise to it and then square it, you will
 get an average value for I=F^2 that always equals the square of the
 noise you added.  It will never be zero, no matter how much averaging
 you do.  Going the other way is problematic because if I really is
 zero, then half of your measurments of it will be negative (and sqrt(I)
 will be imaginary (ha ha)).  This is the problem TRUNCATE tries to
 solve.

 Despite these difficulties, IMHO, cutting out weak data from a ML
 refinement is a really bad idea.  This is because there is a big
 difference between 1 +/- 10 and I don't know, could be anything when
 you are fitting a model to data.  ESPECIALLY when your data/parameters
 ratio is already ~1.0 or less.  This is because the DIFFERENCE between
 Fobs and Fcalc relative to the uncertainty of Fobs is what determines
 wether or not your model is correct to within experimental error.  If
 weak, high-res data are left out, then they can become a dumping ground
 for model bias.  Indeed, there are some entries in the PDB (particularly
 those pre-dating when we knew how to restrain B factors properly) that
 show an up-turn in intensity beyond the quoted resolution cutoff (if
 you look at the Wilson plot of Fcalc).  This is because the refinement
 program was allowed to make Fcalc beyond the resolution cutoff anything
 it wanted (and it did).

 The only time I think cutting out data because it is weak is appropriate
 is for map calculations. Leaving out an HKL from the map is the same as
 assigning it to zero (unless it is a sigma-a map that fills in with
 Fcalcs).  In maps, weak data (I/sd  1) will (by definition) add more
 noise than signal.  In fact, calculating an anomalous difference
 Patterson with DANO/SIGDANO as the coefficients instead of DANO can
 often lead to better maps.

 Yes, your Rmerge, Rcryst and Rfree will all go up if you include weak
 data in your scaling and refinement, but the accuracy of your model will
 improve.  If you (or your reviewer) are worried about this, I suggest
 using the old, traditional 3-sigma cutoff for data used to calculate R.
 Keep the anachronisms together.  Yes, the PDB allows this.  In fact,
 (last time I checked) you are asked to enter what sigma cutoff you used
 for your R factors.

 In the last 100 days (3750 PDB depositions), the REMARK   3   DATA
 CUTOFF stats are thus:

 sigma-cutoff  popularity
 NULL  13.84%
 NONE  13.65%
 -2.5 to -1.5   0.37%
 -0.5 to 0.5   62.48%
 0.5 to 1.5 2.03%
 1.5 to 2.5 6.51%
 2.5 to 3.5 0.61%
 3.5 to 4.5 0.24%
  4.5   0.27%

 So it would appear mine is not a popular attitude.

 -James Holton
 MAD Scientist


 Shane Atwell wrote:

 Could someone point me to some standards for data quality, especially
 for publishing structures? I'm wondering in particular about highest
 shell completeness, multiplicity, sigma and Rmerge.

 A co-worker pointed me to a '97 article by Kleywegt and Jones:

 _http://xray.bmc.uu.se/gerard/gmrp/gmrp.html_

 To decide at which shell to cut off the resolution, we nowadays tend
 to use the following criteria for the highest shell: completeness  80
 %, 

Re: [ccp4bb] Highest shell standards

2007-03-23 Thread Eleanor Dodson
This is a good point - I had thought that D would be very low for an 
incomplete shell, but that doesnt seem to be true..


Garib - what do you think?
Eleanor


Petrus H Zwart wrote:
I typically process my data to a maximum I/sig near 1, and 
completeness in

the highest resolution shell to 50% or greater. It



What about maps computed of very incomplete datasets at high resolution? Don't 
you get a false sense of details when the missing reflections are filled in 
with DFc when computing a 2MFo-DFc map?

P


  


Re: [ccp4bb] Highest shell standards

2007-03-23 Thread Santarsiero, Bernard D.
I seem to recall this being discussed at some point.

For the difference electron density map, there clearly isn't a downside to
loss of reflections, i.e., the coefficients in the map generation are
formally zero for Fo-Fc (which all the scaling, weight, sigma-A bits in
there). If the phases are fairly good, then you are assuming that the Fobs
agrees perfectly with the Fcalc. You don't gain any new details in the
map, but the map isn't distorted with loss of these terms.

As for the 2Fo-Fc (2mFo-DFc, or something like that) electron density map,
it again assumes that the phases are in good shape, and you essentially
lose any new information you could gain from the addition of new Fobs
terms, but the map isn't distorted since the terms are zero.

I seem to recall in the dark ages that you could make an Fobs map, and
included Fcalc, or some fraction of Fcalc (like 0.5Fc) in as the Fobs term
for missing reflections. That way the amplitudes are closer to being
correct for resolution shells that are fairly incomplete. Anyone else
remember this for small molecule structures?

The real issue, generally, is that the phases are the most important
factor in making a good map, and the structure factors are, unfortunately,
weaker contributors to features in the maps.

Bernie




On Fri, March 23, 2007 4:02 am, Eleanor Dodson wrote:
 This is a good point - I had thought that D would be very low for an
 incomplete shell, but that doesnt seem to be true..

 Garib - what do you think?
 Eleanor


 Petrus H Zwart wrote:
 I typically process my data to a maximum I/sig near 1, and
 completeness in
 the highest resolution shell to 50% or greater. It


 What about maps computed of very incomplete datasets at high resolution?
 Don't you get a false sense of details when the missing reflections are
 filled in with DFc when computing a 2MFo-DFc map?

 P






Re: [ccp4bb] Highest shell standards

2007-03-23 Thread Ian Tickle
I don't understand why should D be low for an incomplete shell?
According to Randy's tutorial:

D includes effects of:

difference in position or scattering factor

missing atoms

difference in overall scale or B-factor

i.e. all kinds of error in the SF model, but this is surely uncorrelated
with completeness?

-- Ian
 

 -Original Message-
 From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On 
 Behalf Of Eleanor Dodson
 Sent: 23 March 2007 09:03
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] Highest shell standards
 
 This is a good point - I had thought that D would be very low for an 
 incomplete shell, but that doesnt seem to be true..
 
 Garib - what do you think?
 Eleanor
 
 
 Petrus H Zwart wrote:
  I typically process my data to a maximum I/sig near 1, and 
  completeness in
  the highest resolution shell to 50% or greater. It
  
 
  What about maps computed of very incomplete datasets at 
 high resolution? Don't you get a false sense of details when 
 the missing reflections are filled in with DFc when computing 
 a 2MFo-DFc map?
 
  P
 
 

 
 

Disclaimer
This communication is confidential and may contain privileged information 
intended solely for the named addressee(s). It may not be used or disclosed 
except for the purpose for which it has been sent. If you are not the intended 
recipient you must not review, use, disclose, copy, distribute or take any 
action in reliance upon it. If you have received this communication in error, 
please notify Astex Therapeutics Ltd by emailing [EMAIL PROTECTED] and destroy 
all copies of the message and any attached documents. 
Astex Therapeutics Ltd monitors, controls and protects all its messaging 
traffic in compliance with its corporate email policy. The Company accepts no 
liability or responsibility for any onward transmission or use of emails and 
attachments having left the Astex Therapeutics domain.  Unless expressly 
stated, opinions in this message are those of the individual sender and not of 
Astex Therapeutics Ltd. The recipient should check this email and any 
attachments for the presence of computer viruses. Astex Therapeutics Ltd 
accepts no liability for damage caused by any virus transmitted by this email. 
E-mail is susceptible to data corruption, interception, unauthorized amendment, 
and tampering, Astex Therapeutics Ltd only send and receive e-mails on the 
basis that the Company is not liable for any such alteration or any 
consequences thereof.
Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, 
Cambridge CB4 0QA under number 3751674


Re: [ccp4bb] Highest shell standards

2007-03-22 Thread Ranvir Singh
I will agree with Ulrich. Even at 3.0 A, it is
possible to have a  structure with reasonable accuracy
which can explain the biological function/ or is
consistent with available biochemical data.
Ranvir
--- Ulrich Genick [EMAIL PROTECTED] wrote:

 Here are my 2-3 cents worth on the topic:
 
 The first thing to keep in mind is that the goal of
 a structure  
 determination
 is not to get the best stats or to claim the highest
 possible  
 resolution.
 The goal is to get the best possible structure and
 to be confident that
 observed features in a structure are real and not
 the result of noise.
 
  From that perspective, if any of the conclusions
 one draws from a  
 structure
 change depending on whether one includes data with
 an I/sigI in the  
 highest
 resolution shell of 2 or 1, one probably treads on
 thin ice.
 
 The general guide that one should include only data,
 for which the  
 shell's average
   I/sigI  2 comes from the following simple
 consideration.
 
 
 F/sigF = 2 I/sigI
 
 So if you include data with an I/sigI of 2 then your
 F/sigF =4.  In  
 other words you will
 have a roughly 25% experimental uncertainty in your
 F.
 Now assume that you actually knew the structure of
 your protein and  
 you would
 calculate the crystallographic R-factor between the
 Fcalcs from your  
 true structure and the
 observed F.
 In this situation, you would expect to get a
 crystallographic R- 
 factor around 25%,
 simply because of the average error in your
 experimental structure  
 factor.
 Since most macromolecular structures have R-factors
 around 20%, it  
 makes little
 sense to include data, where the experimental
 uncertainty alone will
 guarantee that your R-factor will be worse.
 Of course, these days maximum-likely-hood refinement
 will just down  
 weight
 such data and all you do is to burn CPU cycles.
 
 
 If you actually want to do a semi rigorous test of
 where you should stop
 including data, simply include increasingly higher
 resolution data in  
 your
 refinement and see if your structure improves.
 If you have really high resolution data (i.e. 
 better than 1.2 Angstrom)
 you can do matrix inversion in SHELX and get
 estimated standard  
 deviations (esd)
 for your refined parameters. As you include more and
 more data the  
 esds should
 initially decrease. Simply keep including higher
 resolution data  
 until your esds
 start to increase again.
 
 Similarly, for lower resolution data you can monitor
 some molecular  
 parameters, which are not
 included in the stereochemical restraints and see,
 if the inclusion  
 of higher-resolution data makes the
 agreement between the observed and expected
 parameters better. For  
 example SHELX does not
 restrain torsion angles in aliphatic portions of
 side chains. If your  
 structure improves, those
 angles should cluster more tightly around +60 -60
 and 180...
 
 
 
 
 Cheers,
 
 Ulrich
 
 
  Could someone point me to some standards for data
 quality,  
  especially for publishing structures? I'm
 wondering in particular  
  about highest shell completeness, multiplicity,
 sigma and Rmerge.
 
  A co-worker pointed me to a '97 article by
 Kleywegt and Jones:
 
  http://xray.bmc.uu.se/gerard/gmrp/gmrp.html
 
  To decide at which shell to cut off the
 resolution, we nowadays  
  tend to use the following criteria for the highest
 shell:  
  completeness  80 %, multiplicity  2, more than
 60 % of the  
  reflections with I  3 sigma(I), and Rmerge  40
 %. In our opinion,  
  it is better to have a good 1.8 Å structure, than
 a poor 1.637 Å  
  structure.
 
  Are these recommendations still valid with maximum
 likelihood  
  methods? We tend to use more data, especially in
 terms of the  
  Rmerge and sigma cuttoff.
 
  Thanks in advance,
 
  Shane Atwell
 
 



 

TV dinner still cooling? 
Check out Tonight's Picks on Yahoo! TV.
http://tv.yahoo.com/


Re: [ccp4bb] Highest shell standards

2007-03-22 Thread Santarsiero, Bernard D.
There are journals that have specific specifications for these parameters,
so it matters where you publish. I've seen restrictions that the highest
resolution shell has to have I/sig  2 and completeness  90%. Your
mileage may vary.

I typically process my data to a maximum I/sig near 1, and completeness in
the highest resolution shell to 50% or greater. It's reasonable to expect
the multiplicity/redundancy to be greater than 2, though that is difficult
with the lower symmetry space groups in triclinc and monoclinic systems
(depending upon crystal orientation and detector geometry). The chi^2's
should be relatively uniform over the entire resolution range, near 1 in
the highest resolution bins, and near 1 overall. With this set of
criteria, R(merge)/R(sym) (on I) can be as high as 20% and near 100% for
the highest resolution shell. R is a poor descriptor when you have a
substantial number of weak intensities because it is dominated by the
denominator; chi^2's are a better descriptor since it has, essentially,
the same numerator.

One should also note that the I/sig criteria can be misleading. It is the
*average* of the I/sig in a resolution shell, and as such, will include
intensities that are both weaker and stronger than the average. For the
highest resolution shell, if you discard intensities greater than 2sig,
then you are also discarding intensities substantially greater than 2sig
as well. The natural falloff of the intensities is reflected (no pun
intended) by the average B-factor of the structure, and you need the
higher resolution, weaker data to best define that parameter.

Protein diffraction data is inherently weak, and far weaker than we obtain
for small molecule crystals. Generally, we need all the data we can get,
and the dynamic range of the data that we do get is smaller than that
observed for small molecule crystals. That's why we use restraints in
refinement. An observation of a weak intensity is just as valid as the
observation of a strong observation, since you are minimizing a function
related to matching Iobs to Icalc. This is even more valid with refinement
targets like the maximum likelihood function. The ONLY reasons that we
ever used I/sig or F/sig cutoffs in refinements was to make the
calculations faster (since we were substantially limited by computing
power decades ago), the sig's were not well-defined for weak intensities
(especially for F's), and the detectors were not as sensitive. Now, with
high brilliance x-ray sources and modern detectors, you can, in fact,
measure weak intensities well--far better than we could decades ago. And
while the dynamic range of intensities for a protein set is relatively
flat, in comparison to a small molecule dataset, those weak terms near
zero are important in restraining the Fcalc's to be small, and therefore
helping to define the phases properly.

In 2007, I don't see a valid argument of severe cutoff's in I/sig at the
processing stage. I/sig = 1 and a reasonable completeness of 30-50% in the
highest resolution shell should be adequate to include most of the useful
data. Later on, during refinement, you can, indeed, increase the
resolution limit, if you wish. Again, with targets like maximum
likelihood, there is no statistical reason to do that. You do it because
it makes the R(cryst), R(free), and FOM look better. You do it because you
want to have a 2.00A vs. 1.96A resolution structure. What is always true
is that you need to look at the maps, and they need as many terms in the
Fourier summation as you can include. There should never be an argument
that you're savings on computing cycles. It's takes far longer to look
carefully at an electron density map and make decisions on what to do than
to carry out refinement. We're rarely talking about twice the computing
time, we're probably thinking 10% more. That's definately not a reason to
throw out data. We've got lots of computing power and lots of disk
storage, let's use to our advantage.

That's my nickel.

Bernie Santarsiero




On Thu, March 22, 2007 7:00 am, Ranvir Singh wrote:
 I will agree with Ulrich. Even at 3.0 A, it is
 possible to have a  structure with reasonable accuracy
 which can explain the biological function/ or is
 consistent with available biochemical data.
 Ranvir
 --- Ulrich Genick [EMAIL PROTECTED] wrote:

 Here are my 2-3 cents worth on the topic:

 The first thing to keep in mind is that the goal of
 a structure
 determination
 is not to get the best stats or to claim the highest
 possible
 resolution.
 The goal is to get the best possible structure and
 to be confident that
 observed features in a structure are real and not
 the result of noise.

  From that perspective, if any of the conclusions
 one draws from a
 structure
 change depending on whether one includes data with
 an I/sigI in the
 highest
 resolution shell of 2 or 1, one probably treads on
 thin ice.

 The general guide that one should include only data,
 for which the
 shell's average
   I/sigI  2 

Re: [ccp4bb] Highest shell standards

2007-03-22 Thread Sue Roberts
I have a question about how the experimental sigmas are affected when  
one includes resolution shells containing mostly unobserved  
reflections.  Does this vary with the data reduction software being  
used?


One thing I've noticed when scaling data (this with d*trek (Crystal  
Clear) since it's the program I use most) is that I/sigma(I) of  
reflections can change significantly when one changes the high  
resolution cutoff.


If I set the detector so that the edge is about where I stop seeing  
reflections and integrate to the corner of the detector, I'll get a  
dataset where I/sigma(I) is really compressed - there is a lot of  
high resolution data with I/sigma(I) about 1, but for the lowest  
resolution shell, the overall I/sigma(I) will be maybe 8-9.  If the  
data set is cutoff at a lower resolution (where I/sigma(I) in the  
shell is about 2) and scaled, I/sigma(I) in the lowest resolution  
shell will be maybe 20 or even higher (OK, there is a different  
resolution cutoff for this shell, but if I look at individual  
reflections, the trend holds).  Since the maximum likelihood  
refinements use sigmas for weighting this must affect the  
refinement.  My experience is that interpretation of the maps is  
easier when the cut-off datasets are used. (Refinement is via refmac5  
or shelx).  Also, I'm mostly talking about datasets from  well- 
diffracting crystals (better than 2 A).


Sue


On Mar 22, 2007, at 2:29 AM, Eleanor Dodson wrote:

I feel that is rather severe for ML refinement - sometimes for  
instance it helps to use all the data from the images, integrating  
right into the corners, thus getting a very incomplete set for the  
highest resolution shell.  But for exptl phasing it does not help  
to have many many weak reflections..


Is there any way of testing this though? Only way I can think of to  
refine against a poorer set with varying protocols, then improve  
crystals/data and see which protocol for the poorer data gave the  
best agreement for the model comparison?


And even that is not decisive - presumably the data would have come  
from different crystals with maybe small diffs between the models..

Eleanor



Shane Atwell wrote:


Could someone point me to some standards for data quality,  
especially for publishing structures? I'm wondering in particular  
about highest shell completeness, multiplicity, sigma and Rmerge.


A co-worker pointed me to a '97 article by Kleywegt and Jones:

_http://xray.bmc.uu.se/gerard/gmrp/gmrp.html_

To decide at which shell to cut off the resolution, we nowadays  
tend to use the following criteria for the highest shell:  
completeness  80 %, multiplicity  2, more than 60 % of the  
reflections with I  3 sigma(I), and Rmerge  40 %. In our  
opinion, it is better to have a good 1.8 Å structure, than a poor  
1.637 Å structure.


Are these recommendations still valid with maximum likelihood  
methods? We tend to use more data, especially in terms of the  
Rmerge and sigma cuttoff.


Thanks in advance,

*Shane Atwell*



Sue Roberts
Biochemistry  Biopphysics
University of Arizona

[EMAIL PROTECTED]


Re: [ccp4bb] Highest shell standards

2007-03-22 Thread Petrus H Zwart
 I typically process my data to a maximum I/sig near 1, and 
 completeness in
 the highest resolution shell to 50% or greater. It

What about maps computed of very incomplete datasets at high resolution? Don't 
you get a false sense of details when the missing reflections are filled in 
with DFc when computing a 2MFo-DFc map?

P


Re: [ccp4bb] Highest shell standards

2007-03-22 Thread Jose Antonio Cuesta-Seijo
I have observed something similar myself using Saint in a Bruker  
Smart6K detector and using denzo in lab and syncrotron detectors.
First the I over sigma never really drops to zero, no mater how much  
over your real resolution limit you integrate.
Second, if I integrate to the visual resolution limit of, say, 1.5A,  
I get nice dataset statistics. If I now re-integrate (and re-scale)  
to 1.2A, thus including mostly empty (background) pixels everywhere,  
then cut the dataset after scaling to the same 1.5A limit, the  
statistics are much worse, booth in I over sigma and Rint. (Sorry, no  
numbers here, I tried this sometime ago).
I guess the integration is suffering at profile fitting level while  
the scaling suffers from general noise (those weak reflections  
between 1.5A and 1.2A will be half of your total data!).

I would be careful to go much over the visual resolution limit.
Jose.

**
Jose Antonio Cuesta-Seijo
Cancer Genomics and Proteomics
Ontario Cancer Institute, UHN
MaRs TMDT Room 4-902M
101 College Street
M5G 1L7 Toronto, On, Canada
Phone:  (416)581-7544
Fax: (416)581-7562
email: [EMAIL PROTECTED]
**


On Mar 22, 2007, at 10:59 AM, Sue Roberts wrote:

I have a question about how the experimental sigmas are affected  
when one includes resolution shells containing mostly unobserved  
reflections.  Does this vary with the data reduction software being  
used?


One thing I've noticed when scaling data (this with d*trek (Crystal  
Clear) since it's the program I use most) is that I/sigma(I) of  
reflections can change significantly when one changes the high  
resolution cutoff.


If I set the detector so that the edge is about where I stop seeing  
reflections and integrate to the corner of the detector, I'll get a  
dataset where I/sigma(I) is really compressed - there is a lot of  
high resolution data with I/sigma(I) about 1, but for the lowest  
resolution shell, the overall I/sigma(I) will be maybe 8-9.  If the  
data set is cutoff at a lower resolution (where I/sigma(I) in the  
shell is about 2) and scaled, I/sigma(I) in the lowest resolution  
shell will be maybe 20 or even higher (OK, there is a different  
resolution cutoff for this shell, but if I look at individual  
reflections, the trend holds).  Since the maximum likelihood  
refinements use sigmas for weighting this must affect the  
refinement.  My experience is that interpretation of the maps is  
easier when the cut-off datasets are used. (Refinement is via  
refmac5 or shelx).  Also, I'm mostly talking about datasets from   
well-diffracting crystals (better than 2 A).


Sue


On Mar 22, 2007, at 2:29 AM, Eleanor Dodson wrote:

I feel that is rather severe for ML refinement - sometimes for  
instance it helps to use all the data from the images, integrating  
right into the corners, thus getting a very incomplete set for the  
highest resolution shell.  But for exptl phasing it does not help  
to have many many weak reflections..


Is there any way of testing this though? Only way I can think of  
to refine against a poorer set with varying protocols, then  
improve crystals/data and see which protocol for the poorer data  
gave the best agreement for the model comparison?


And even that is not decisive - presumably the data would have  
come from different crystals with maybe small diffs between the  
models..

Eleanor



Shane Atwell wrote:


Could someone point me to some standards for data quality,  
especially for publishing structures? I'm wondering in particular  
about highest shell completeness, multiplicity, sigma and Rmerge.


A co-worker pointed me to a '97 article by Kleywegt and Jones:

_http://xray.bmc.uu.se/gerard/gmrp/gmrp.html_

To decide at which shell to cut off the resolution, we nowadays  
tend to use the following criteria for the highest shell:  
completeness  80 %, multiplicity  2, more than 60 % of the  
reflections with I  3 sigma(I), and Rmerge  40 %. In our  
opinion, it is better to have a good 1.8 Å structure, than a poor  
1.637 Å structure.


Are these recommendations still valid with maximum likelihood  
methods? We tend to use more data, especially in terms of the  
Rmerge and sigma cuttoff.


Thanks in advance,

*Shane Atwell*



Sue Roberts
Biochemistry  Biopphysics
University of Arizona

[EMAIL PROTECTED]


Re: [ccp4bb] Highest shell standards

2007-03-22 Thread Santarsiero, Bernard D.
My guess is that the integration is roughly the same, unless the profiles
are really poorly defined, but that the scaling that is suffering from
using a lot of high-resolution weak data. We've integrated data to say
I/sig = 0.5, and sometimes seem more problems with scaling. I then cut
back to I/sig = 1 and it's fine. The major difficulty arises that if the
crystal is dying, and the decay/scaling/absorption model isn't good
enough. So that's definately a consideration when trying to get a more
complete data set and higher resolution (so more redundancy).

Bernie


On Thu, March 22, 2007 12:21 pm, Jose Antonio Cuesta-Seijo wrote:
 I have observed something similar myself using Saint in a Bruker
 Smart6K detector and using denzo in lab and syncrotron detectors.
 First the I over sigma never really drops to zero, no mater how much
 over your real resolution limit you integrate.
 Second, if I integrate to the visual resolution limit of, say, 1.5A,
 I get nice dataset statistics. If I now re-integrate (and re-scale)
 to 1.2A, thus including mostly empty (background) pixels everywhere,
 then cut the dataset after scaling to the same 1.5A limit, the
 statistics are much worse, booth in I over sigma and Rint. (Sorry, no
 numbers here, I tried this sometime ago).
 I guess the integration is suffering at profile fitting level while
 the scaling suffers from general noise (those weak reflections
 between 1.5A and 1.2A will be half of your total data!).
 I would be careful to go much over the visual resolution limit.
 Jose.

 **
 Jose Antonio Cuesta-Seijo
 Cancer Genomics and Proteomics
 Ontario Cancer Institute, UHN
 MaRs TMDT Room 4-902M
 101 College Street
 M5G 1L7 Toronto, On, Canada
 Phone:  (416)581-7544
 Fax: (416)581-7562
 email: [EMAIL PROTECTED]
 **


 On Mar 22, 2007, at 10:59 AM, Sue Roberts wrote:

 I have a question about how the experimental sigmas are affected
 when one includes resolution shells containing mostly unobserved
 reflections.  Does this vary with the data reduction software being
 used?

 One thing I've noticed when scaling data (this with d*trek (Crystal
 Clear) since it's the program I use most) is that I/sigma(I) of
 reflections can change significantly when one changes the high
 resolution cutoff.

 If I set the detector so that the edge is about where I stop seeing
 reflections and integrate to the corner of the detector, I'll get a
 dataset where I/sigma(I) is really compressed - there is a lot of
 high resolution data with I/sigma(I) about 1, but for the lowest
 resolution shell, the overall I/sigma(I) will be maybe 8-9.  If the
 data set is cutoff at a lower resolution (where I/sigma(I) in the
 shell is about 2) and scaled, I/sigma(I) in the lowest resolution
 shell will be maybe 20 or even higher (OK, there is a different
 resolution cutoff for this shell, but if I look at individual
 reflections, the trend holds).  Since the maximum likelihood
 refinements use sigmas for weighting this must affect the
 refinement.  My experience is that interpretation of the maps is
 easier when the cut-off datasets are used. (Refinement is via
 refmac5 or shelx).  Also, I'm mostly talking about datasets from
 well-diffracting crystals (better than 2 A).

 Sue


 On Mar 22, 2007, at 2:29 AM, Eleanor Dodson wrote:

 I feel that is rather severe for ML refinement - sometimes for
 instance it helps to use all the data from the images, integrating
 right into the corners, thus getting a very incomplete set for the
 highest resolution shell.  But for exptl phasing it does not help
 to have many many weak reflections..

 Is there any way of testing this though? Only way I can think of
 to refine against a poorer set with varying protocols, then
 improve crystals/data and see which protocol for the poorer data
 gave the best agreement for the model comparison?

 And even that is not decisive - presumably the data would have
 come from different crystals with maybe small diffs between the
 models..
 Eleanor



 Shane Atwell wrote:

 Could someone point me to some standards for data quality,
 especially for publishing structures? I'm wondering in particular
 about highest shell completeness, multiplicity, sigma and Rmerge.

 A co-worker pointed me to a '97 article by Kleywegt and Jones:

 _http://xray.bmc.uu.se/gerard/gmrp/gmrp.html_

 To decide at which shell to cut off the resolution, we nowadays
 tend to use the following criteria for the highest shell:
 completeness  80 %, multiplicity  2, more than 60 % of the
 reflections with I  3 sigma(I), and Rmerge  40 %. In our
 opinion, it is better to have a good 1.8 Å structure, than a poor
 1.637 Å structure.

 Are these recommendations still valid with maximum likelihood
 methods? We tend to use more data, especially in terms of the
 Rmerge and sigma cuttoff.

 Thanks in advance,

 *Shane Atwell*


 Sue Roberts
 Biochemistry  Biopphysics
 University of Arizona

 [EMAIL PROTECTED]



[ccp4bb] Highest shell standards

2007-03-21 Thread Shane Atwell
Could someone point me to some standards for data quality, especially for 
publishing structures? I'm wondering in particular about highest shell 
completeness, multiplicity, sigma and Rmerge.

A co-worker pointed me to a '97 article by Kleywegt and Jones:

http://xray.bmc.uu.se/gerard/gmrp/gmrp.html

To decide at which shell to cut off the resolution, we nowadays tend to use 
the following criteria for the highest shell: completeness  80 %, multiplicity 
 2, more than 60 % of the reflections with I  3 sigma(I), and Rmerge  40 %. 
In our opinion, it is better to have a good 1.8 Å structure, than a poor 1.637 
Å structure.

Are these recommendations still valid with maximum likelihood methods? We tend 
to use more data, especially in terms of the Rmerge and sigma cuttoff.

Thanks in advance,

Shane Atwell



Re: [ccp4bb] Highest shell standards

2007-03-21 Thread Bart Hazes

Shane Atwell wrote:
Could someone point me to some standards for data quality, especially 
for publishing structures? I'm wondering in particular about highest 
shell completeness, multiplicity, sigma and Rmerge.


A co-worker pointed me to a '97 article by Kleywegt and Jones:

_http://xray.bmc.uu.se/gerard/gmrp/gmrp.html_

To decide at which shell to cut off the resolution, we nowadays tend to 
use the following criteria for the highest shell: completeness  80 %, 
multiplicity  2, more than 60 % of the reflections with I  3 sigma(I), 
and Rmerge  40 %. In our opinion, it is better to have a good 1.8 Å 
structure, than a poor 1.637 Å structure.


Are these recommendations still valid with maximum likelihood methods? 
We tend to use more data, especially in terms of the Rmerge and sigma 
cuttoff.


Thanks in advance,

*Shane Atwell*



Hi Shane,

I definately no longer support the conclusions from that 1997 paper and 
I think Gerard probably has adjusted his thoughts on this matter as 
well. Leaving out the data beyond 1.8A (in the example above) only makes 
sense if there is no information in those data. Completeness and 
multiplicity are not direct measures of data quality and the 60% 
I3sigma and Rmerge 40% criteria are too strict to my liking. I prefer 
to look at I/SigI mostly, and as a reviewer I have no problems with 
highest resolution shell stats with I/SigI anywhere in the 1.5-2.5 
range. I won't complain about higher I/SigI values if done for the right 
reasons (phasing data sets being the most common), but will say 
something if they state their crystals diffract to 2.5A if the I/SigI in 
the highest resolution shell is, let's say, 5. Their crystals don't 
diffract to 2.5A, they just didn't let the crystals diffract to their 
full potential. You can't really reject papers for that reason, but 
there appears to be a conservative epidemic when it comes to restricting 
the resolution of the data set.


Bart

--

==

Bart Hazes (Assistant Professor)
Dept. of Medical Microbiology  Immunology
University of Alberta
1-15 Medical Sciences Building
Edmonton, Alberta
Canada, T6G 2H7
phone:  1-780-492-0042
fax:1-780-492-7521

==


Re: [ccp4bb] Highest shell standards

2007-03-21 Thread Ulrich Genick

Here are my 2-3 cents worth on the topic:

The first thing to keep in mind is that the goal of a structure  
determination
is not to get the best stats or to claim the highest possible  
resolution.

The goal is to get the best possible structure and to be confident that
observed features in a structure are real and not the result of noise.

From that perspective, if any of the conclusions one draws from a  
structure
change depending on whether one includes data with an I/sigI in the  
highest

resolution shell of 2 or 1, one probably treads on thin ice.

The general guide that one should include only data, for which the  
shell's average

 I/sigI  2 comes from the following simple consideration.


F/sigF = 2 I/sigI

So if you include data with an I/sigI of 2 then your F/sigF =4.  In  
other words you will

have a roughly 25% experimental uncertainty in your F.
Now assume that you actually knew the structure of your protein and  
you would
calculate the crystallographic R-factor between the Fcalcs from your  
true structure and the

observed F.
In this situation, you would expect to get a crystallographic R- 
factor around 25%,
simply because of the average error in your experimental structure  
factor.
Since most macromolecular structures have R-factors around 20%, it  
makes little

sense to include data, where the experimental uncertainty alone will
guarantee that your R-factor will be worse.
Of course, these days maximum-likely-hood refinement will just down  
weight

such data and all you do is to burn CPU cycles.


If you actually want to do a semi rigorous test of where you should stop
including data, simply include increasingly higher resolution data in  
your

refinement and see if your structure improves.
If you have really high resolution data (i.e.  better than 1.2 Angstrom)
you can do matrix inversion in SHELX and get estimated standard  
deviations (esd)
for your refined parameters. As you include more and more data the  
esds should
initially decrease. Simply keep including higher resolution data  
until your esds

start to increase again.

Similarly, for lower resolution data you can monitor some molecular  
parameters, which are not
included in the stereochemical restraints and see, if the inclusion  
of higher-resolution data makes the
agreement between the observed and expected parameters better. For  
example SHELX does not
restrain torsion angles in aliphatic portions of side chains. If your  
structure improves, those

angles should cluster more tightly around +60 -60 and 180...




Cheers,

Ulrich


Could someone point me to some standards for data quality,  
especially for publishing structures? I'm wondering in particular  
about highest shell completeness, multiplicity, sigma and Rmerge.


A co-worker pointed me to a '97 article by Kleywegt and Jones:

http://xray.bmc.uu.se/gerard/gmrp/gmrp.html

To decide at which shell to cut off the resolution, we nowadays  
tend to use the following criteria for the highest shell:  
completeness  80 %, multiplicity  2, more than 60 % of the  
reflections with I  3 sigma(I), and Rmerge  40 %. In our opinion,  
it is better to have a good 1.8 Å structure, than a poor 1.637 Å  
structure.


Are these recommendations still valid with maximum likelihood  
methods? We tend to use more data, especially in terms of the  
Rmerge and sigma cuttoff.


Thanks in advance,

Shane Atwell