Re: [ccp4bb] Highest shell standards
On Fri, 23 Mar 2007, Edward Berry wrote: I believe fft does not by default do this- only if you use the fillin keyword? One place where it might be important is in density modification/ molecular averaging. Molecular averaging can be seen as a numerical solution of the MR equations, finding a density map which (A) obeys the NCS/intercrystal symmetry and (B) yields the observed F's upon Fourier transformation. Now if on each cycle you set the missing F's to zero, you are requiring it to have as part of (B) zero amplitude for the missing reflections, which is more restrictive and incorrect. If instead you allow the missing F's to float, calculating them on each cycle from the previous map using the fillin option, someone has shown (don't have the reference handy at the moment) that the F's tend toward the true F's (in the case that they weren't really missing but omitted as part of the test). Ed You have phase scatter plots in Acta Cryst. D51, 575-589 (1995) that show just that: the map inversion phases from NCS averaging tend toward the true phases. Since F's are phased quantities and since phases are more important than amplitudes, non random amplitudes plus non random phases (both from map inversion of averaged maps) lead to better electron density maps. Fred. -- Fred. Vellieux, esq. = IBS J.-P. Ebel CEA CNRS UJF / LBM 41 rue Jules Horowitz 38027 Grenoble Cedex 01 France Tel: (+33) (0) 438789605 Fax: (+33) (0) 438785494 =
Re: [ccp4bb] Highest shell standards
You are probably referring to the following works: Caliandro et al, Acta Cryst. D61 (2005) 556-565 and Caliandro et al, Acta Cryst. D61 (2005) 1080-1087 in which they used density modification to calculate phases for unmeasured reflections, and used the phases to extend the resolution by calculating rough estimates unmeasured amplitudes. Using this technique they actually could improve the electron density. If I'm not mistaken, George Sheldrick has implemented this Free Lunch algorithm in SHELXE. /Michel On Fri, 2007-03-23 at 08:05 -0800, Edward Berry wrote: If instead you allow the missing F's to float, calculating them on each cycle from the previous map using the fillin option, someone has shown (don't have the reference handy at the moment) that the F's tend toward the true F's (in the case that they weren't really missing but omitted as part of the test). Ed
Re: [ccp4bb] Highest shell standards
Actually I was thinking of a somewhat earlier paper: Rayment,I. Molecular relacement method at low resolution: optimum strategy and intrinsic limitations as determined by calculations on icosahedral virus models. Acta Crystallogr. A 39, 102 116 (1983). But thanks for bringing the Caliandro et al. paper to my attention. Thanks also to Fred. Vellieux for his comments, and to Pete Dunton for explaining to me that while fft doesn't do fillin by default, the 2MFo-DFc map coefficients from refmac5 do have fillin values for the missing reflection, making model bias a problem when many missing residues are included. Now I understand Petrus's question. Ed Michel Fodje wrote: You are probably referring to the following works: Caliandro et al, Acta Cryst. D61 (2005) 556-565 and Caliandro et al, Acta Cryst. D61 (2005) 1080-1087 in which they used density modification to calculate phases for unmeasured reflections, and used the phases to extend the resolution by calculating rough estimates unmeasured amplitudes. Using this technique they actually could improve the electron density. If I'm not mistaken, George Sheldrick has implemented this Free Lunch algorithm in SHELXE. /Michel On Fri, 2007-03-23 at 08:05 -0800, Edward Berry wrote: If instead you allow the missing F's to float, calculating them on each cycle from the previous map using the fillin option, someone has shown (don't have the reference handy at the moment) that the F's tend toward the true F's (in the case that they weren't really missing but omitted as part of the test). Ed
Re: [ccp4bb] Highest shell standards -oops-
-oops- many missing REFLECTIONS are included.
Re: [ccp4bb] Highest shell standards
Isn't automatically included fabricated data for missing reflections a really bad idea for anisotropic data where most reflections are missing at high resolution? Shouldn't there be a big flashing red flag alerting the user to what's been done? Phoebe At 01:22 PM 3/26/2007, Edward A. Berry wrote: Actually I was thinking of a somewhat earlier paper: Rayment,I. Molecular relacement method at low resolution: optimum strategy and intrinsic limitations as determined by calculations on icosahedral virus models. Acta Crystallogr. A 39, 102 116 (1983). But thanks for bringing the Caliandro et al. paper to my attention. Thanks also to Fred. Vellieux for his comments, and to Pete Dunton for explaining to me that while fft doesn't do fillin by default, the 2MFo-DFc map coefficients from refmac5 do have fillin values for the missing reflection, making model bias a problem when many missing residues are included. Now I understand Petrus's question. Ed Michel Fodje wrote: You are probably referring to the following works: Caliandro et al, Acta Cryst. D61 (2005) 556-565 and Caliandro et al, Acta Cryst. D61 (2005) 1080-1087 in which they used density modification to calculate phases for unmeasured reflections, and used the phases to extend the resolution by calculating rough estimates unmeasured amplitudes. Using this technique they actually could improve the electron density. If I'm not mistaken, George Sheldrick has implemented this Free Lunch algorithm in SHELXE. /Michel On Fri, 2007-03-23 at 08:05 -0800, Edward Berry wrote: If instead you allow the missing F's to float, calculating them on each cycle from the previous map using the fillin option, someone has shown (don't have the reference handy at the moment) that the F's tend toward the true F's (in the case that they weren't really missing but omitted as part of the test). Ed --- Phoebe A. Rice Assoc. Prof., Dept. of Biochemistry Molecular Biology The University of Chicago phone 773 834 1723 fax 773 702 0439 http://bmb.bsd.uchicago.edu/index.html http://www.nasa.gov/mission_pages/cassini/multimedia/pia06064.html
Re: [ccp4bb] Highest shell standards
I very much agree with Holton. I also find the following to be a simple and very helpful argument in the discussion (it was given to me by Morten Kjeldgaard): Your model should reproduce your weak data by weak Fc's - therefore you need your weak reflections in the refinement I personally like to judge a proper data cutoff from the Wilson plot - as long as it looks right data are OK to use. If on the other hand the plot changes slope, levels out or even rises at some point, then I cut the data there. The plot will often show nice appearance to a I/sigI level of about 1 - 1.5, other times 2 or 3 - it really depends on the crystal (and the assumption of valid wilson statistics, of course). Poul I generally cut off integration at the shell wheree I/sigI 0.5 and then cut off merged data where MnI/sd(I) ~ 1.5. It is always easier to to cut off data later than to re-integrate it. I never look at the Rmerge, Rsym, Rpim or Rwhatever in the highest resolution shell. This is because R-statistics are inappropriate for weak data. Don't believe me? If anybody out there doesn't think that spots with no intensity are important, then why are you looking so carefully at your systematic absences? ;) The Rabsent statistic (if it existed) would always be dividing by zero, and giving wildly varying numbers 100% (unless your absences really do have intensity, but then they are not absent, are they?). There is information in the intensity of an absent spot (a systematic absence, or any spot beyond your true resolution limit). Unfortunately, measuring zero is hard because the signal to noise ratio will always be ... zero. Statistics as we know it seems to fear this noisesignal domain. For example, the error propagation Ulrich pointed out (F/sigF = 2 I/sigI) breaks down as I approaches zero. If you take F=0, and add random noise to it and then square it, you will get an average value for I=F^2 that always equals the square of the noise you added. It will never be zero, no matter how much averaging you do. Going the other way is problematic because if I really is zero, then half of your measurments of it will be negative (and sqrt(I) will be imaginary (ha ha)). This is the problem TRUNCATE tries to solve. Despite these difficulties, IMHO, cutting out weak data from a ML refinement is a really bad idea. This is because there is a big difference between 1 +/- 10 and I don't know, could be anything when you are fitting a model to data. ESPECIALLY when your data/parameters ratio is already ~1.0 or less. This is because the DIFFERENCE between Fobs and Fcalc relative to the uncertainty of Fobs is what determines wether or not your model is correct to within experimental error. If weak, high-res data are left out, then they can become a dumping ground for model bias. Indeed, there are some entries in the PDB (particularly those pre-dating when we knew how to restrain B factors properly) that show an up-turn in intensity beyond the quoted resolution cutoff (if you look at the Wilson plot of Fcalc). This is because the refinement program was allowed to make Fcalc beyond the resolution cutoff anything it wanted (and it did). The only time I think cutting out data because it is weak is appropriate is for map calculations. Leaving out an HKL from the map is the same as assigning it to zero (unless it is a sigma-a map that fills in with Fcalcs). In maps, weak data (I/sd 1) will (by definition) add more noise than signal. In fact, calculating an anomalous difference Patterson with DANO/SIGDANO as the coefficients instead of DANO can often lead to better maps. Yes, your Rmerge, Rcryst and Rfree will all go up if you include weak data in your scaling and refinement, but the accuracy of your model will improve. If you (or your reviewer) are worried about this, I suggest using the old, traditional 3-sigma cutoff for data used to calculate R. Keep the anachronisms together. Yes, the PDB allows this. In fact, (last time I checked) you are asked to enter what sigma cutoff you used for your R factors. In the last 100 days (3750 PDB depositions), the REMARK 3 DATA CUTOFF stats are thus: sigma-cutoff popularity NULL 13.84% NONE 13.65% -2.5 to -1.5 0.37% -0.5 to 0.5 62.48% 0.5 to 1.5 2.03% 1.5 to 2.5 6.51% 2.5 to 3.5 0.61% 3.5 to 4.5 0.24% 4.5 0.27% So it would appear mine is not a popular attitude. -James Holton MAD Scientist Shane Atwell wrote: Could someone point me to some standards for data quality, especially for publishing structures? I'm wondering in particular about highest shell completeness, multiplicity, sigma and Rmerge. A co-worker pointed me to a '97 article by Kleywegt and Jones: _http://xray.bmc.uu.se/gerard/gmrp/gmrp.html_ To decide at which shell to cut off the resolution, we nowadays tend to use the following criteria for the highest shell: completeness 80 %,
Re: [ccp4bb] Highest shell standards
This is a good point - I had thought that D would be very low for an incomplete shell, but that doesnt seem to be true.. Garib - what do you think? Eleanor Petrus H Zwart wrote: I typically process my data to a maximum I/sig near 1, and completeness in the highest resolution shell to 50% or greater. It What about maps computed of very incomplete datasets at high resolution? Don't you get a false sense of details when the missing reflections are filled in with DFc when computing a 2MFo-DFc map? P
Re: [ccp4bb] Highest shell standards
I seem to recall this being discussed at some point. For the difference electron density map, there clearly isn't a downside to loss of reflections, i.e., the coefficients in the map generation are formally zero for Fo-Fc (which all the scaling, weight, sigma-A bits in there). If the phases are fairly good, then you are assuming that the Fobs agrees perfectly with the Fcalc. You don't gain any new details in the map, but the map isn't distorted with loss of these terms. As for the 2Fo-Fc (2mFo-DFc, or something like that) electron density map, it again assumes that the phases are in good shape, and you essentially lose any new information you could gain from the addition of new Fobs terms, but the map isn't distorted since the terms are zero. I seem to recall in the dark ages that you could make an Fobs map, and included Fcalc, or some fraction of Fcalc (like 0.5Fc) in as the Fobs term for missing reflections. That way the amplitudes are closer to being correct for resolution shells that are fairly incomplete. Anyone else remember this for small molecule structures? The real issue, generally, is that the phases are the most important factor in making a good map, and the structure factors are, unfortunately, weaker contributors to features in the maps. Bernie On Fri, March 23, 2007 4:02 am, Eleanor Dodson wrote: This is a good point - I had thought that D would be very low for an incomplete shell, but that doesnt seem to be true.. Garib - what do you think? Eleanor Petrus H Zwart wrote: I typically process my data to a maximum I/sig near 1, and completeness in the highest resolution shell to 50% or greater. It What about maps computed of very incomplete datasets at high resolution? Don't you get a false sense of details when the missing reflections are filled in with DFc when computing a 2MFo-DFc map? P
Re: [ccp4bb] Highest shell standards
I don't understand why should D be low for an incomplete shell? According to Randy's tutorial: D includes effects of: difference in position or scattering factor missing atoms difference in overall scale or B-factor i.e. all kinds of error in the SF model, but this is surely uncorrelated with completeness? -- Ian -Original Message- From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of Eleanor Dodson Sent: 23 March 2007 09:03 To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Highest shell standards This is a good point - I had thought that D would be very low for an incomplete shell, but that doesnt seem to be true.. Garib - what do you think? Eleanor Petrus H Zwart wrote: I typically process my data to a maximum I/sig near 1, and completeness in the highest resolution shell to 50% or greater. It What about maps computed of very incomplete datasets at high resolution? Don't you get a false sense of details when the missing reflections are filled in with DFc when computing a 2MFo-DFc map? P Disclaimer This communication is confidential and may contain privileged information intended solely for the named addressee(s). It may not be used or disclosed except for the purpose for which it has been sent. If you are not the intended recipient you must not review, use, disclose, copy, distribute or take any action in reliance upon it. If you have received this communication in error, please notify Astex Therapeutics Ltd by emailing [EMAIL PROTECTED] and destroy all copies of the message and any attached documents. Astex Therapeutics Ltd monitors, controls and protects all its messaging traffic in compliance with its corporate email policy. The Company accepts no liability or responsibility for any onward transmission or use of emails and attachments having left the Astex Therapeutics domain. Unless expressly stated, opinions in this message are those of the individual sender and not of Astex Therapeutics Ltd. The recipient should check this email and any attachments for the presence of computer viruses. Astex Therapeutics Ltd accepts no liability for damage caused by any virus transmitted by this email. E-mail is susceptible to data corruption, interception, unauthorized amendment, and tampering, Astex Therapeutics Ltd only send and receive e-mails on the basis that the Company is not liable for any such alteration or any consequences thereof. Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, Cambridge CB4 0QA under number 3751674
Re: [ccp4bb] Highest shell standards
I will agree with Ulrich. Even at 3.0 A, it is possible to have a structure with reasonable accuracy which can explain the biological function/ or is consistent with available biochemical data. Ranvir --- Ulrich Genick [EMAIL PROTECTED] wrote: Here are my 2-3 cents worth on the topic: The first thing to keep in mind is that the goal of a structure determination is not to get the best stats or to claim the highest possible resolution. The goal is to get the best possible structure and to be confident that observed features in a structure are real and not the result of noise. From that perspective, if any of the conclusions one draws from a structure change depending on whether one includes data with an I/sigI in the highest resolution shell of 2 or 1, one probably treads on thin ice. The general guide that one should include only data, for which the shell's average I/sigI 2 comes from the following simple consideration. F/sigF = 2 I/sigI So if you include data with an I/sigI of 2 then your F/sigF =4. In other words you will have a roughly 25% experimental uncertainty in your F. Now assume that you actually knew the structure of your protein and you would calculate the crystallographic R-factor between the Fcalcs from your true structure and the observed F. In this situation, you would expect to get a crystallographic R- factor around 25%, simply because of the average error in your experimental structure factor. Since most macromolecular structures have R-factors around 20%, it makes little sense to include data, where the experimental uncertainty alone will guarantee that your R-factor will be worse. Of course, these days maximum-likely-hood refinement will just down weight such data and all you do is to burn CPU cycles. If you actually want to do a semi rigorous test of where you should stop including data, simply include increasingly higher resolution data in your refinement and see if your structure improves. If you have really high resolution data (i.e. better than 1.2 Angstrom) you can do matrix inversion in SHELX and get estimated standard deviations (esd) for your refined parameters. As you include more and more data the esds should initially decrease. Simply keep including higher resolution data until your esds start to increase again. Similarly, for lower resolution data you can monitor some molecular parameters, which are not included in the stereochemical restraints and see, if the inclusion of higher-resolution data makes the agreement between the observed and expected parameters better. For example SHELX does not restrain torsion angles in aliphatic portions of side chains. If your structure improves, those angles should cluster more tightly around +60 -60 and 180... Cheers, Ulrich Could someone point me to some standards for data quality, especially for publishing structures? I'm wondering in particular about highest shell completeness, multiplicity, sigma and Rmerge. A co-worker pointed me to a '97 article by Kleywegt and Jones: http://xray.bmc.uu.se/gerard/gmrp/gmrp.html To decide at which shell to cut off the resolution, we nowadays tend to use the following criteria for the highest shell: completeness 80 %, multiplicity 2, more than 60 % of the reflections with I 3 sigma(I), and Rmerge 40 %. In our opinion, it is better to have a good 1.8 Å structure, than a poor 1.637 Å structure. Are these recommendations still valid with maximum likelihood methods? We tend to use more data, especially in terms of the Rmerge and sigma cuttoff. Thanks in advance, Shane Atwell TV dinner still cooling? Check out Tonight's Picks on Yahoo! TV. http://tv.yahoo.com/
Re: [ccp4bb] Highest shell standards
There are journals that have specific specifications for these parameters, so it matters where you publish. I've seen restrictions that the highest resolution shell has to have I/sig 2 and completeness 90%. Your mileage may vary. I typically process my data to a maximum I/sig near 1, and completeness in the highest resolution shell to 50% or greater. It's reasonable to expect the multiplicity/redundancy to be greater than 2, though that is difficult with the lower symmetry space groups in triclinc and monoclinic systems (depending upon crystal orientation and detector geometry). The chi^2's should be relatively uniform over the entire resolution range, near 1 in the highest resolution bins, and near 1 overall. With this set of criteria, R(merge)/R(sym) (on I) can be as high as 20% and near 100% for the highest resolution shell. R is a poor descriptor when you have a substantial number of weak intensities because it is dominated by the denominator; chi^2's are a better descriptor since it has, essentially, the same numerator. One should also note that the I/sig criteria can be misleading. It is the *average* of the I/sig in a resolution shell, and as such, will include intensities that are both weaker and stronger than the average. For the highest resolution shell, if you discard intensities greater than 2sig, then you are also discarding intensities substantially greater than 2sig as well. The natural falloff of the intensities is reflected (no pun intended) by the average B-factor of the structure, and you need the higher resolution, weaker data to best define that parameter. Protein diffraction data is inherently weak, and far weaker than we obtain for small molecule crystals. Generally, we need all the data we can get, and the dynamic range of the data that we do get is smaller than that observed for small molecule crystals. That's why we use restraints in refinement. An observation of a weak intensity is just as valid as the observation of a strong observation, since you are minimizing a function related to matching Iobs to Icalc. This is even more valid with refinement targets like the maximum likelihood function. The ONLY reasons that we ever used I/sig or F/sig cutoffs in refinements was to make the calculations faster (since we were substantially limited by computing power decades ago), the sig's were not well-defined for weak intensities (especially for F's), and the detectors were not as sensitive. Now, with high brilliance x-ray sources and modern detectors, you can, in fact, measure weak intensities well--far better than we could decades ago. And while the dynamic range of intensities for a protein set is relatively flat, in comparison to a small molecule dataset, those weak terms near zero are important in restraining the Fcalc's to be small, and therefore helping to define the phases properly. In 2007, I don't see a valid argument of severe cutoff's in I/sig at the processing stage. I/sig = 1 and a reasonable completeness of 30-50% in the highest resolution shell should be adequate to include most of the useful data. Later on, during refinement, you can, indeed, increase the resolution limit, if you wish. Again, with targets like maximum likelihood, there is no statistical reason to do that. You do it because it makes the R(cryst), R(free), and FOM look better. You do it because you want to have a 2.00A vs. 1.96A resolution structure. What is always true is that you need to look at the maps, and they need as many terms in the Fourier summation as you can include. There should never be an argument that you're savings on computing cycles. It's takes far longer to look carefully at an electron density map and make decisions on what to do than to carry out refinement. We're rarely talking about twice the computing time, we're probably thinking 10% more. That's definately not a reason to throw out data. We've got lots of computing power and lots of disk storage, let's use to our advantage. That's my nickel. Bernie Santarsiero On Thu, March 22, 2007 7:00 am, Ranvir Singh wrote: I will agree with Ulrich. Even at 3.0 A, it is possible to have a structure with reasonable accuracy which can explain the biological function/ or is consistent with available biochemical data. Ranvir --- Ulrich Genick [EMAIL PROTECTED] wrote: Here are my 2-3 cents worth on the topic: The first thing to keep in mind is that the goal of a structure determination is not to get the best stats or to claim the highest possible resolution. The goal is to get the best possible structure and to be confident that observed features in a structure are real and not the result of noise. From that perspective, if any of the conclusions one draws from a structure change depending on whether one includes data with an I/sigI in the highest resolution shell of 2 or 1, one probably treads on thin ice. The general guide that one should include only data, for which the shell's average I/sigI 2
Re: [ccp4bb] Highest shell standards
I have a question about how the experimental sigmas are affected when one includes resolution shells containing mostly unobserved reflections. Does this vary with the data reduction software being used? One thing I've noticed when scaling data (this with d*trek (Crystal Clear) since it's the program I use most) is that I/sigma(I) of reflections can change significantly when one changes the high resolution cutoff. If I set the detector so that the edge is about where I stop seeing reflections and integrate to the corner of the detector, I'll get a dataset where I/sigma(I) is really compressed - there is a lot of high resolution data with I/sigma(I) about 1, but for the lowest resolution shell, the overall I/sigma(I) will be maybe 8-9. If the data set is cutoff at a lower resolution (where I/sigma(I) in the shell is about 2) and scaled, I/sigma(I) in the lowest resolution shell will be maybe 20 or even higher (OK, there is a different resolution cutoff for this shell, but if I look at individual reflections, the trend holds). Since the maximum likelihood refinements use sigmas for weighting this must affect the refinement. My experience is that interpretation of the maps is easier when the cut-off datasets are used. (Refinement is via refmac5 or shelx). Also, I'm mostly talking about datasets from well- diffracting crystals (better than 2 A). Sue On Mar 22, 2007, at 2:29 AM, Eleanor Dodson wrote: I feel that is rather severe for ML refinement - sometimes for instance it helps to use all the data from the images, integrating right into the corners, thus getting a very incomplete set for the highest resolution shell. But for exptl phasing it does not help to have many many weak reflections.. Is there any way of testing this though? Only way I can think of to refine against a poorer set with varying protocols, then improve crystals/data and see which protocol for the poorer data gave the best agreement for the model comparison? And even that is not decisive - presumably the data would have come from different crystals with maybe small diffs between the models.. Eleanor Shane Atwell wrote: Could someone point me to some standards for data quality, especially for publishing structures? I'm wondering in particular about highest shell completeness, multiplicity, sigma and Rmerge. A co-worker pointed me to a '97 article by Kleywegt and Jones: _http://xray.bmc.uu.se/gerard/gmrp/gmrp.html_ To decide at which shell to cut off the resolution, we nowadays tend to use the following criteria for the highest shell: completeness 80 %, multiplicity 2, more than 60 % of the reflections with I 3 sigma(I), and Rmerge 40 %. In our opinion, it is better to have a good 1.8 Å structure, than a poor 1.637 Å structure. Are these recommendations still valid with maximum likelihood methods? We tend to use more data, especially in terms of the Rmerge and sigma cuttoff. Thanks in advance, *Shane Atwell* Sue Roberts Biochemistry Biopphysics University of Arizona [EMAIL PROTECTED]
Re: [ccp4bb] Highest shell standards
I typically process my data to a maximum I/sig near 1, and completeness in the highest resolution shell to 50% or greater. It What about maps computed of very incomplete datasets at high resolution? Don't you get a false sense of details when the missing reflections are filled in with DFc when computing a 2MFo-DFc map? P
Re: [ccp4bb] Highest shell standards
I have observed something similar myself using Saint in a Bruker Smart6K detector and using denzo in lab and syncrotron detectors. First the I over sigma never really drops to zero, no mater how much over your real resolution limit you integrate. Second, if I integrate to the visual resolution limit of, say, 1.5A, I get nice dataset statistics. If I now re-integrate (and re-scale) to 1.2A, thus including mostly empty (background) pixels everywhere, then cut the dataset after scaling to the same 1.5A limit, the statistics are much worse, booth in I over sigma and Rint. (Sorry, no numbers here, I tried this sometime ago). I guess the integration is suffering at profile fitting level while the scaling suffers from general noise (those weak reflections between 1.5A and 1.2A will be half of your total data!). I would be careful to go much over the visual resolution limit. Jose. ** Jose Antonio Cuesta-Seijo Cancer Genomics and Proteomics Ontario Cancer Institute, UHN MaRs TMDT Room 4-902M 101 College Street M5G 1L7 Toronto, On, Canada Phone: (416)581-7544 Fax: (416)581-7562 email: [EMAIL PROTECTED] ** On Mar 22, 2007, at 10:59 AM, Sue Roberts wrote: I have a question about how the experimental sigmas are affected when one includes resolution shells containing mostly unobserved reflections. Does this vary with the data reduction software being used? One thing I've noticed when scaling data (this with d*trek (Crystal Clear) since it's the program I use most) is that I/sigma(I) of reflections can change significantly when one changes the high resolution cutoff. If I set the detector so that the edge is about where I stop seeing reflections and integrate to the corner of the detector, I'll get a dataset where I/sigma(I) is really compressed - there is a lot of high resolution data with I/sigma(I) about 1, but for the lowest resolution shell, the overall I/sigma(I) will be maybe 8-9. If the data set is cutoff at a lower resolution (where I/sigma(I) in the shell is about 2) and scaled, I/sigma(I) in the lowest resolution shell will be maybe 20 or even higher (OK, there is a different resolution cutoff for this shell, but if I look at individual reflections, the trend holds). Since the maximum likelihood refinements use sigmas for weighting this must affect the refinement. My experience is that interpretation of the maps is easier when the cut-off datasets are used. (Refinement is via refmac5 or shelx). Also, I'm mostly talking about datasets from well-diffracting crystals (better than 2 A). Sue On Mar 22, 2007, at 2:29 AM, Eleanor Dodson wrote: I feel that is rather severe for ML refinement - sometimes for instance it helps to use all the data from the images, integrating right into the corners, thus getting a very incomplete set for the highest resolution shell. But for exptl phasing it does not help to have many many weak reflections.. Is there any way of testing this though? Only way I can think of to refine against a poorer set with varying protocols, then improve crystals/data and see which protocol for the poorer data gave the best agreement for the model comparison? And even that is not decisive - presumably the data would have come from different crystals with maybe small diffs between the models.. Eleanor Shane Atwell wrote: Could someone point me to some standards for data quality, especially for publishing structures? I'm wondering in particular about highest shell completeness, multiplicity, sigma and Rmerge. A co-worker pointed me to a '97 article by Kleywegt and Jones: _http://xray.bmc.uu.se/gerard/gmrp/gmrp.html_ To decide at which shell to cut off the resolution, we nowadays tend to use the following criteria for the highest shell: completeness 80 %, multiplicity 2, more than 60 % of the reflections with I 3 sigma(I), and Rmerge 40 %. In our opinion, it is better to have a good 1.8 Å structure, than a poor 1.637 Å structure. Are these recommendations still valid with maximum likelihood methods? We tend to use more data, especially in terms of the Rmerge and sigma cuttoff. Thanks in advance, *Shane Atwell* Sue Roberts Biochemistry Biopphysics University of Arizona [EMAIL PROTECTED]
Re: [ccp4bb] Highest shell standards
My guess is that the integration is roughly the same, unless the profiles are really poorly defined, but that the scaling that is suffering from using a lot of high-resolution weak data. We've integrated data to say I/sig = 0.5, and sometimes seem more problems with scaling. I then cut back to I/sig = 1 and it's fine. The major difficulty arises that if the crystal is dying, and the decay/scaling/absorption model isn't good enough. So that's definately a consideration when trying to get a more complete data set and higher resolution (so more redundancy). Bernie On Thu, March 22, 2007 12:21 pm, Jose Antonio Cuesta-Seijo wrote: I have observed something similar myself using Saint in a Bruker Smart6K detector and using denzo in lab and syncrotron detectors. First the I over sigma never really drops to zero, no mater how much over your real resolution limit you integrate. Second, if I integrate to the visual resolution limit of, say, 1.5A, I get nice dataset statistics. If I now re-integrate (and re-scale) to 1.2A, thus including mostly empty (background) pixels everywhere, then cut the dataset after scaling to the same 1.5A limit, the statistics are much worse, booth in I over sigma and Rint. (Sorry, no numbers here, I tried this sometime ago). I guess the integration is suffering at profile fitting level while the scaling suffers from general noise (those weak reflections between 1.5A and 1.2A will be half of your total data!). I would be careful to go much over the visual resolution limit. Jose. ** Jose Antonio Cuesta-Seijo Cancer Genomics and Proteomics Ontario Cancer Institute, UHN MaRs TMDT Room 4-902M 101 College Street M5G 1L7 Toronto, On, Canada Phone: (416)581-7544 Fax: (416)581-7562 email: [EMAIL PROTECTED] ** On Mar 22, 2007, at 10:59 AM, Sue Roberts wrote: I have a question about how the experimental sigmas are affected when one includes resolution shells containing mostly unobserved reflections. Does this vary with the data reduction software being used? One thing I've noticed when scaling data (this with d*trek (Crystal Clear) since it's the program I use most) is that I/sigma(I) of reflections can change significantly when one changes the high resolution cutoff. If I set the detector so that the edge is about where I stop seeing reflections and integrate to the corner of the detector, I'll get a dataset where I/sigma(I) is really compressed - there is a lot of high resolution data with I/sigma(I) about 1, but for the lowest resolution shell, the overall I/sigma(I) will be maybe 8-9. If the data set is cutoff at a lower resolution (where I/sigma(I) in the shell is about 2) and scaled, I/sigma(I) in the lowest resolution shell will be maybe 20 or even higher (OK, there is a different resolution cutoff for this shell, but if I look at individual reflections, the trend holds). Since the maximum likelihood refinements use sigmas for weighting this must affect the refinement. My experience is that interpretation of the maps is easier when the cut-off datasets are used. (Refinement is via refmac5 or shelx). Also, I'm mostly talking about datasets from well-diffracting crystals (better than 2 A). Sue On Mar 22, 2007, at 2:29 AM, Eleanor Dodson wrote: I feel that is rather severe for ML refinement - sometimes for instance it helps to use all the data from the images, integrating right into the corners, thus getting a very incomplete set for the highest resolution shell. But for exptl phasing it does not help to have many many weak reflections.. Is there any way of testing this though? Only way I can think of to refine against a poorer set with varying protocols, then improve crystals/data and see which protocol for the poorer data gave the best agreement for the model comparison? And even that is not decisive - presumably the data would have come from different crystals with maybe small diffs between the models.. Eleanor Shane Atwell wrote: Could someone point me to some standards for data quality, especially for publishing structures? I'm wondering in particular about highest shell completeness, multiplicity, sigma and Rmerge. A co-worker pointed me to a '97 article by Kleywegt and Jones: _http://xray.bmc.uu.se/gerard/gmrp/gmrp.html_ To decide at which shell to cut off the resolution, we nowadays tend to use the following criteria for the highest shell: completeness 80 %, multiplicity 2, more than 60 % of the reflections with I 3 sigma(I), and Rmerge 40 %. In our opinion, it is better to have a good 1.8 Å structure, than a poor 1.637 Å structure. Are these recommendations still valid with maximum likelihood methods? We tend to use more data, especially in terms of the Rmerge and sigma cuttoff. Thanks in advance, *Shane Atwell* Sue Roberts Biochemistry Biopphysics University of Arizona [EMAIL PROTECTED]
[ccp4bb] Highest shell standards
Could someone point me to some standards for data quality, especially for publishing structures? I'm wondering in particular about highest shell completeness, multiplicity, sigma and Rmerge. A co-worker pointed me to a '97 article by Kleywegt and Jones: http://xray.bmc.uu.se/gerard/gmrp/gmrp.html To decide at which shell to cut off the resolution, we nowadays tend to use the following criteria for the highest shell: completeness 80 %, multiplicity 2, more than 60 % of the reflections with I 3 sigma(I), and Rmerge 40 %. In our opinion, it is better to have a good 1.8 Å structure, than a poor 1.637 Å structure. Are these recommendations still valid with maximum likelihood methods? We tend to use more data, especially in terms of the Rmerge and sigma cuttoff. Thanks in advance, Shane Atwell
Re: [ccp4bb] Highest shell standards
Shane Atwell wrote: Could someone point me to some standards for data quality, especially for publishing structures? I'm wondering in particular about highest shell completeness, multiplicity, sigma and Rmerge. A co-worker pointed me to a '97 article by Kleywegt and Jones: _http://xray.bmc.uu.se/gerard/gmrp/gmrp.html_ To decide at which shell to cut off the resolution, we nowadays tend to use the following criteria for the highest shell: completeness 80 %, multiplicity 2, more than 60 % of the reflections with I 3 sigma(I), and Rmerge 40 %. In our opinion, it is better to have a good 1.8 Å structure, than a poor 1.637 Å structure. Are these recommendations still valid with maximum likelihood methods? We tend to use more data, especially in terms of the Rmerge and sigma cuttoff. Thanks in advance, *Shane Atwell* Hi Shane, I definately no longer support the conclusions from that 1997 paper and I think Gerard probably has adjusted his thoughts on this matter as well. Leaving out the data beyond 1.8A (in the example above) only makes sense if there is no information in those data. Completeness and multiplicity are not direct measures of data quality and the 60% I3sigma and Rmerge 40% criteria are too strict to my liking. I prefer to look at I/SigI mostly, and as a reviewer I have no problems with highest resolution shell stats with I/SigI anywhere in the 1.5-2.5 range. I won't complain about higher I/SigI values if done for the right reasons (phasing data sets being the most common), but will say something if they state their crystals diffract to 2.5A if the I/SigI in the highest resolution shell is, let's say, 5. Their crystals don't diffract to 2.5A, they just didn't let the crystals diffract to their full potential. You can't really reject papers for that reason, but there appears to be a conservative epidemic when it comes to restricting the resolution of the data set. Bart -- == Bart Hazes (Assistant Professor) Dept. of Medical Microbiology Immunology University of Alberta 1-15 Medical Sciences Building Edmonton, Alberta Canada, T6G 2H7 phone: 1-780-492-0042 fax:1-780-492-7521 ==
Re: [ccp4bb] Highest shell standards
Here are my 2-3 cents worth on the topic: The first thing to keep in mind is that the goal of a structure determination is not to get the best stats or to claim the highest possible resolution. The goal is to get the best possible structure and to be confident that observed features in a structure are real and not the result of noise. From that perspective, if any of the conclusions one draws from a structure change depending on whether one includes data with an I/sigI in the highest resolution shell of 2 or 1, one probably treads on thin ice. The general guide that one should include only data, for which the shell's average I/sigI 2 comes from the following simple consideration. F/sigF = 2 I/sigI So if you include data with an I/sigI of 2 then your F/sigF =4. In other words you will have a roughly 25% experimental uncertainty in your F. Now assume that you actually knew the structure of your protein and you would calculate the crystallographic R-factor between the Fcalcs from your true structure and the observed F. In this situation, you would expect to get a crystallographic R- factor around 25%, simply because of the average error in your experimental structure factor. Since most macromolecular structures have R-factors around 20%, it makes little sense to include data, where the experimental uncertainty alone will guarantee that your R-factor will be worse. Of course, these days maximum-likely-hood refinement will just down weight such data and all you do is to burn CPU cycles. If you actually want to do a semi rigorous test of where you should stop including data, simply include increasingly higher resolution data in your refinement and see if your structure improves. If you have really high resolution data (i.e. better than 1.2 Angstrom) you can do matrix inversion in SHELX and get estimated standard deviations (esd) for your refined parameters. As you include more and more data the esds should initially decrease. Simply keep including higher resolution data until your esds start to increase again. Similarly, for lower resolution data you can monitor some molecular parameters, which are not included in the stereochemical restraints and see, if the inclusion of higher-resolution data makes the agreement between the observed and expected parameters better. For example SHELX does not restrain torsion angles in aliphatic portions of side chains. If your structure improves, those angles should cluster more tightly around +60 -60 and 180... Cheers, Ulrich Could someone point me to some standards for data quality, especially for publishing structures? I'm wondering in particular about highest shell completeness, multiplicity, sigma and Rmerge. A co-worker pointed me to a '97 article by Kleywegt and Jones: http://xray.bmc.uu.se/gerard/gmrp/gmrp.html To decide at which shell to cut off the resolution, we nowadays tend to use the following criteria for the highest shell: completeness 80 %, multiplicity 2, more than 60 % of the reflections with I 3 sigma(I), and Rmerge 40 %. In our opinion, it is better to have a good 1.8 Å structure, than a poor 1.637 Å structure. Are these recommendations still valid with maximum likelihood methods? We tend to use more data, especially in terms of the Rmerge and sigma cuttoff. Thanks in advance, Shane Atwell