Re: [ccp4bb] Intensities and amplitudes
This debate has run and run, and the statistics are unequivocal, BUT I still think we do a poor job of estimating Sigmas - it is educational to take the same data integrated and processed by different algorithms and compare the estimated SigmaIs! both before and after merging. And I know we do an even poorer job of estimated correlation between different observations. Most programs ignore that and treat all observations as independent. Randy's new ML formulation might address this but it is difficult to combine model effects, and measurement anomalies. Small molecule people have two advantages - they usually measure their data more reliably, and they have enough observations to override bad statistics.. So untill these Qs are nearer to being solved, I am not sure whether the results (by which I mean the electron density) will be very different from refinement against Is or Fs A Luddite pt of view.. Eleanor On 3 December 2014 at 19:16, Boaz Shaanan bshaa...@bgu.ac.il wrote: Hi Randy, Question regarding your reply to Pavel: That may well involve the French Wilson algorithm, but can take advantage of whatever is understood by the program (e.g. anisotropy and translational non-crystallographic symmetry, both of which in principle can be modeled better as the atomic model improves). I may have misunderstood you completely, but do you mean that the Fobs's will be recalculated as the model improves (this is where French-Wilson comes into effect, right)? Or only once during refinement? Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Randy Read [rj...@cam.ac.uk] Sent: Wednesday, December 03, 2014 12:46 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Intensities and amplitudes Hi Pavel, We were chatting with Phil Evans the other day about things like this, and generally we were in agreement that any programs that need amplitudes (and you’re right of course, you have to have some sort of amplitude to calculate a map!) should be able to compute them on the fly. That may well involve the French Wilson algorithm, but can take advantage of whatever is understood by the program (e.g. anisotropy and translational non-crystallographic symmetry, both of which in principle can be modeled better as the atomic model improves). I haven’t really worried about R-factors. We could learn to embrace the R-factor on intensity that small molecule crystallographers are comfortable with but, as you say, people are not used to these. If we compute amplitudes on the fly, with a French Wilson algorithm that is calibrated better as the model improves, the R-factors will be calculated with a changing set of Fobs. This would probably be a minor effect, but it’s slightly disconcerting. Randy - Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical ResearchTel: +44 1223 336500 Wellcome Trust/MRC Building Fax: +44 1223 336827 Hills Road E-mail: rj...@cam.ac.uk Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk On 2 Dec 2014, at 21:44, Pavel Afonine pafon...@gmail.com wrote: Hi Randy, I can see all good reasons for using intensities! What about maps and R-factors? I guess you still need F to compute them (I realize you can compute R(I) but this is not what people are used to do in general), and if that's the case then I-F is still inevitable (at least for some purposes). Thanks, Pavel On Tue, Dec 2, 2014 at 1:26 PM, Randy Read rj...@cam.ac.uk wrote: Dear Mohamed, At the moment, a lot of programs require amplitudes, but I believe that they should all eventually be updated to use intensities. In fact, we’re in the end stages of a large project to switch Phaser from using amplitudes to using intensities. There are a number of reasons why, in principle, it’s better to work in terms of intensities. One is that it’s perfectly reasonable to have a negative observed intensity, which can come from subtracting a background estimate with measurement errors from a very weak peak with its own measurement errors. That, of course, is where the French and Wilson algorithm comes in, allowing an amplitude to be estimated without simply taking a square root. However, the problem with the French and Wilson algorithm is that it loses information, i.e. you can’t reconstruct the intensity and its standard deviation. What you get out of French Wilson depends on the estimate of the expected intensity for a reflection, which is typically taken from the mean in the resolution shell but should vary with direction for crystals suffering from anisotropic diffraction
Re: [ccp4bb] Intensities and amplitudes
Dear Randy, I could not agree more. Statistical methods for phasing and refinement must be better using the observed intensities and their esds than with (c)truncated F-values. In particular one should merge intensities, not truncated Fs! To elaborate on Harry's comment, when SHELXL started refining only against intensities 22 years ago, I received many complaints from irate small molecule crystallographers whose papers had been rejected because the unweighted R-factors R2 (based on intensities) were too high. I even sent a letter to editors of the journals involved to explain why R-factors based on intensities are at least twice as high as those based on F, but to no avail. So I had to output R1 (the unweighted R-value based on F) even though the structure had been refined against intensitites, then everyone was happy. Do I correctly understand that you have developed new (better) maximum likelihood criteria for use with I rather than F? Best wishes, George On 12/02/2014 10:26 PM, Randy Read wrote: Dear Mohamed, At the moment, a lot of programs require amplitudes, but I believe that they should all eventually be updated to use intensities. In fact, we’re in the end stages of a large project to switch Phaser from using amplitudes to using intensities. There are a number of reasons why, in principle, it’s better to work in terms of intensities. One is that it’s perfectly reasonable to have a negative observed intensity, which can come from subtracting a background estimate with measurement errors from a very weak peak with its own measurement errors. That, of course, is where the French and Wilson algorithm comes in, allowing an amplitude to be estimated without simply taking a square root. However, the problem with the French and Wilson algorithm is that it loses information, i.e. you can’t reconstruct the intensity and its standard deviation. What you get out of French Wilson depends on the estimate of the expected intensity for a reflection, which is typically taken from t he mean in the resolution shell but should vary with direction for crystals suffering from anisotropic diffraction and should be modulated for crystals with translational non-crystallographic symmetry. Another reason it’s better to work in terms of intensities is that it’s reasonable to assume that the measurement errors for intensities are Gaussian, but then less reasonable to assume that for amplitudes (particularly with the problem that amplitudes can’t be negative). For now, you need amplitudes for a lot of purposes and then the French Wilson algorithm is useful. But what I would strongly recommend is that you hang on to the intensities and you make sure that the intensities are deposited at the PDB. It’s a pity that many PDB depositions only have amplitudes that have been through French Wilson, so that new procedures based on intensities won’t be able to be applied with their full power. Best wishes, Randy Read - Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical ResearchTel: +44 1223 336500 Wellcome Trust/MRC Building Fax: +44 1223 336827 Hills RoadE-mail: rj...@cam.ac.uk Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk On 1 Dec 2014, at 20:49, Mohamed Noor mohamed.n...@staffmail.ul.ie wrote: Dear crystallographers Is there any reason for using one data type over the other? Are there any errors associated with the French and Wilson I-to-F conversion step? Thanks. Mohamed -- Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-33021 or -33068 Fax. +49-551-39-22582
Re: [ccp4bb] Intensities and amplitudes
Dear George, Yes, we’ve developed new likelihood functions that work with intensity data. They’re already available for the molecular replacement calculations in recent nightly-build versions of Phaser (though it’s been a while since a new nightly was released, and we’ve fixed a few problems with outliers that were more extreme than we had anticipated encountering). I’ll be presenting our work on the intensity-based SAD likelihood target at the upcoming CCP4 Study Weekend. It’s possible to define exact intensity-based likelihood functions (at least “exact” when the measurement errors are Gaussian in the observed intensity), but we haven’t found a way of evaluating them without either numerical integration (expensive) or approximation. However, we’ve got a new approximation that turns out to be excellent over the whole range from small to extremely large intensity errors, and which is very efficient to work with. Best wishes, Randy On 3 Dec 2014, at 15:07, George M. Sheldrick gshe...@shelx.uni-ac.gwdg.de wrote: Dear Randy, I could not agree more. Statistical methods for phasing and refinement must be better using the observed intensities and their esds than with (c)truncated F-values. In particular one should merge intensities, not truncated Fs! To elaborate on Harry's comment, when SHELXL started refining only against intensities 22 years ago, I received many complaints from irate small molecule crystallographers whose papers had been rejected because the unweighted R-factors R2 (based on intensities) were too high. I even sent a letter to editors of the journals involved to explain why R-factors based on intensities are at least twice as high as those based on F, but to no avail. So I had to output R1 (the unweighted R-value based on F) even though the structure had been refined against intensitites, then everyone was happy. Do I correctly understand that you have developed new (better) maximum likelihood criteria for use with I rather than F? Best wishes, George On 12/02/2014 10:26 PM, Randy Read wrote: Dear Mohamed, At the moment, a lot of programs require amplitudes, but I believe that they should all eventually be updated to use intensities. In fact, we’re in the end stages of a large project to switch Phaser from using amplitudes to using intensities. There are a number of reasons why, in principle, it’s better to work in terms of intensities. One is that it’s perfectly reasonable to have a negative observed intensity, which can come from subtracting a background estimate with measurement errors from a very weak peak with its own measurement errors. That, of course, is where the French and Wilson algorithm comes in, allowing an amplitude to be estimated without simply taking a square root. However, the problem with the French and Wilson algorithm is that it loses information, i.e. you can’t reconstruct the intensity and its standard deviation. What you get out of French Wilson depends on the estimate of the expected intensity for a reflection, which is typically taken from t he mean in the resolution shell but should vary with direction for crystals suffering from anisotropic diffraction and should be modulated for crystals with translational non-crystallographic symmetry. Another reason it’s better to work in terms of intensities is that it’s reasonable to assume that the measurement errors for intensities are Gaussian, but then less reasonable to assume that for amplitudes (particularly with the problem that amplitudes can’t be negative). For now, you need amplitudes for a lot of purposes and then the French Wilson algorithm is useful. But what I would strongly recommend is that you hang on to the intensities and you make sure that the intensities are deposited at the PDB. It’s a pity that many PDB depositions only have amplitudes that have been through French Wilson, so that new procedures based on intensities won’t be able to be applied with their full power. Best wishes, Randy Read - Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical ResearchTel: +44 1223 336500 Wellcome Trust/MRC Building Fax: +44 1223 336827 Hills Road E-mail: rj...@cam.ac.uk Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk On 1 Dec 2014, at 20:49, Mohamed Noor mohamed.n...@staffmail.ul.ie wrote: Dear crystallographers Is there any reason for using one data type over the other? Are there any errors associated with the French and Wilson I-to-F conversion step? Thanks. Mohamed -- Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-33021 or -33068 Fax. +49-551-39-22582 -- Randy J. Read Department of
Re: [ccp4bb] Intensities and amplitudes
Hi Randy, Question regarding your reply to Pavel: That may well involve the French Wilson algorithm, but can take advantage of whatever is understood by the program (e.g. anisotropy and translational non-crystallographic symmetry, both of which in principle can be modeled better as the atomic model improves). I may have misunderstood you completely, but do you mean that the Fobs's will be recalculated as the model improves (this is where French-Wilson comes into effect, right)? Or only once during refinement? Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Randy Read [rj...@cam.ac.uk] Sent: Wednesday, December 03, 2014 12:46 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Intensities and amplitudes Hi Pavel, We were chatting with Phil Evans the other day about things like this, and generally we were in agreement that any programs that need amplitudes (and you’re right of course, you have to have some sort of amplitude to calculate a map!) should be able to compute them on the fly. That may well involve the French Wilson algorithm, but can take advantage of whatever is understood by the program (e.g. anisotropy and translational non-crystallographic symmetry, both of which in principle can be modeled better as the atomic model improves). I haven’t really worried about R-factors. We could learn to embrace the R-factor on intensity that small molecule crystallographers are comfortable with but, as you say, people are not used to these. If we compute amplitudes on the fly, with a French Wilson algorithm that is calibrated better as the model improves, the R-factors will be calculated with a changing set of Fobs. This would probably be a minor effect, but it’s slightly disconcerting. Randy - Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical ResearchTel: +44 1223 336500 Wellcome Trust/MRC Building Fax: +44 1223 336827 Hills RoadE-mail: rj...@cam.ac.uk Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk On 2 Dec 2014, at 21:44, Pavel Afonine pafon...@gmail.com wrote: Hi Randy, I can see all good reasons for using intensities! What about maps and R-factors? I guess you still need F to compute them (I realize you can compute R(I) but this is not what people are used to do in general), and if that's the case then I-F is still inevitable (at least for some purposes). Thanks, Pavel On Tue, Dec 2, 2014 at 1:26 PM, Randy Read rj...@cam.ac.uk wrote: Dear Mohamed, At the moment, a lot of programs require amplitudes, but I believe that they should all eventually be updated to use intensities. In fact, we’re in the end stages of a large project to switch Phaser from using amplitudes to using intensities. There are a number of reasons why, in principle, it’s better to work in terms of intensities. One is that it’s perfectly reasonable to have a negative observed intensity, which can come from subtracting a background estimate with measurement errors from a very weak peak with its own measurement errors. That, of course, is where the French and Wilson algorithm comes in, allowing an amplitude to be estimated without simply taking a square root. However, the problem with the French and Wilson algorithm is that it loses information, i.e. you can’t reconstruct the intensity and its standard deviation. What you get out of French Wilson depends on the estimate of the expected intensity for a reflection, which is typically taken from the mean in the resolution shell but should vary with direction for crystals suffering from anisotropic diffraction and should be modulated for crystals with translational non-crystallographic symmetry. Another reason it’s better to work in terms of intensities is that it’s reasonable to assume that the measurement errors for intensities are Gaussian, but then less reasonable to assume that for amplitudes (particularly with the problem that amplitudes can’t be negative). For now, you need amplitudes for a lot of purposes and then the French Wilson algorithm is useful. But what I would strongly recommend is that you hang on to the intensities and you make sure that the intensities are deposited at the PDB. It’s a pity that many PDB depositions only have amplitudes that have been through French Wilson, so that new procedures based on intensities won’t be able to be applied with their full power. Best wishes, Randy Read - Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute
Re: [ccp4bb] Intensities and amplitudes
Dear Mohamed, At the moment, a lot of programs require amplitudes, but I believe that they should all eventually be updated to use intensities. In fact, we’re in the end stages of a large project to switch Phaser from using amplitudes to using intensities. There are a number of reasons why, in principle, it’s better to work in terms of intensities. One is that it’s perfectly reasonable to have a negative observed intensity, which can come from subtracting a background estimate with measurement errors from a very weak peak with its own measurement errors. That, of course, is where the French and Wilson algorithm comes in, allowing an amplitude to be estimated without simply taking a square root. However, the problem with the French and Wilson algorithm is that it loses information, i.e. you can’t reconstruct the intensity and its standard deviation. What you get out of French Wilson depends on the estimate of the expected intensity for a reflection, which is typically taken from the mean in the resolution shell but should vary with direction for crystals suffering from anisotropic diffraction and should be modulated for crystals with translational non-crystallographic symmetry. Another reason it’s better to work in terms of intensities is that it’s reasonable to assume that the measurement errors for intensities are Gaussian, but then less reasonable to assume that for amplitudes (particularly with the problem that amplitudes can’t be negative). For now, you need amplitudes for a lot of purposes and then the French Wilson algorithm is useful. But what I would strongly recommend is that you hang on to the intensities and you make sure that the intensities are deposited at the PDB. It’s a pity that many PDB depositions only have amplitudes that have been through French Wilson, so that new procedures based on intensities won’t be able to be applied with their full power. Best wishes, Randy Read - Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical ResearchTel: +44 1223 336500 Wellcome Trust/MRC Building Fax: +44 1223 336827 Hills RoadE-mail: rj...@cam.ac.uk Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk On 1 Dec 2014, at 20:49, Mohamed Noor mohamed.n...@staffmail.ul.ie wrote: Dear crystallographers Is there any reason for using one data type over the other? Are there any errors associated with the French and Wilson I-to-F conversion step? Thanks. Mohamed
Re: [ccp4bb] Intensities and amplitudes
Hi Randy, I can see all good reasons for using intensities! What about maps and R-factors? I guess you still need F to compute them (I realize you can compute R(I) but this is not what people are used to do in general), and if that's the case then I-F is still inevitable (at least for some purposes). Thanks, Pavel On Tue, Dec 2, 2014 at 1:26 PM, Randy Read rj...@cam.ac.uk wrote: Dear Mohamed, At the moment, a lot of programs require amplitudes, but I believe that they should all eventually be updated to use intensities. In fact, we’re in the end stages of a large project to switch Phaser from using amplitudes to using intensities. There are a number of reasons why, in principle, it’s better to work in terms of intensities. One is that it’s perfectly reasonable to have a negative observed intensity, which can come from subtracting a background estimate with measurement errors from a very weak peak with its own measurement errors. That, of course, is where the French and Wilson algorithm comes in, allowing an amplitude to be estimated without simply taking a square root. However, the problem with the French and Wilson algorithm is that it loses information, i.e. you can’t reconstruct the intensity and its standard deviation. What you get out of French Wilson depends on the estimate of the expected intensity for a reflection, which is typically taken from the mean in the resolution shell but should vary with direction for crystals suffering from anisotropic diffraction and should be modulated for crystals with translational non-crystallographic symmetry. Another reason it’s better to work in terms of intensities is that it’s reasonable to assume that the measurement errors for intensities are Gaussian, but then less reasonable to assume that for amplitudes (particularly with the problem that amplitudes can’t be negative). For now, you need amplitudes for a lot of purposes and then the French Wilson algorithm is useful. But what I would strongly recommend is that you hang on to the intensities and you make sure that the intensities are deposited at the PDB. It’s a pity that many PDB depositions only have amplitudes that have been through French Wilson, so that new procedures based on intensities won’t be able to be applied with their full power. Best wishes, Randy Read - Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical ResearchTel: +44 1223 336500 Wellcome Trust/MRC Building Fax: +44 1223 336827 Hills Road E-mail: rj...@cam.ac.uk Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk On 1 Dec 2014, at 20:49, Mohamed Noor mohamed.n...@staffmail.ul.ie wrote: Dear crystallographers Is there any reason for using one data type over the other? Are there any errors associated with the French and Wilson I-to-F conversion step? Thanks. Mohamed
Re: [ccp4bb] Intensities and amplitudes
Hi Pavel, We were chatting with Phil Evans the other day about things like this, and generally we were in agreement that any programs that need amplitudes (and you’re right of course, you have to have some sort of amplitude to calculate a map!) should be able to compute them on the fly. That may well involve the French Wilson algorithm, but can take advantage of whatever is understood by the program (e.g. anisotropy and translational non-crystallographic symmetry, both of which in principle can be modeled better as the atomic model improves). I haven’t really worried about R-factors. We could learn to embrace the R-factor on intensity that small molecule crystallographers are comfortable with but, as you say, people are not used to these. If we compute amplitudes on the fly, with a French Wilson algorithm that is calibrated better as the model improves, the R-factors will be calculated with a changing set of Fobs. This would probably be a minor effect, but it’s slightly disconcerting. Randy - Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical ResearchTel: +44 1223 336500 Wellcome Trust/MRC Building Fax: +44 1223 336827 Hills RoadE-mail: rj...@cam.ac.uk Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk On 2 Dec 2014, at 21:44, Pavel Afonine pafon...@gmail.com wrote: Hi Randy, I can see all good reasons for using intensities! What about maps and R-factors? I guess you still need F to compute them (I realize you can compute R(I) but this is not what people are used to do in general), and if that's the case then I-F is still inevitable (at least for some purposes). Thanks, Pavel On Tue, Dec 2, 2014 at 1:26 PM, Randy Read rj...@cam.ac.uk wrote: Dear Mohamed, At the moment, a lot of programs require amplitudes, but I believe that they should all eventually be updated to use intensities. In fact, we’re in the end stages of a large project to switch Phaser from using amplitudes to using intensities. There are a number of reasons why, in principle, it’s better to work in terms of intensities. One is that it’s perfectly reasonable to have a negative observed intensity, which can come from subtracting a background estimate with measurement errors from a very weak peak with its own measurement errors. That, of course, is where the French and Wilson algorithm comes in, allowing an amplitude to be estimated without simply taking a square root. However, the problem with the French and Wilson algorithm is that it loses information, i.e. you can’t reconstruct the intensity and its standard deviation. What you get out of French Wilson depends on the estimate of the expected intensity for a reflection, which is typically taken from the mean in the resolution shell but should vary with direction for crystals suffering from anisotropic diffraction and should be modulated for crystals with translational non-crystallographic symmetry. Another reason it’s better to work in terms of intensities is that it’s reasonable to assume that the measurement errors for intensities are Gaussian, but then less reasonable to assume that for amplitudes (particularly with the problem that amplitudes can’t be negative). For now, you need amplitudes for a lot of purposes and then the French Wilson algorithm is useful. But what I would strongly recommend is that you hang on to the intensities and you make sure that the intensities are deposited at the PDB. It’s a pity that many PDB depositions only have amplitudes that have been through French Wilson, so that new procedures based on intensities won’t be able to be applied with their full power. Best wishes, Randy Read - Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical ResearchTel: +44 1223 336500 Wellcome Trust/MRC Building Fax: +44 1223 336827 Hills RoadE-mail: rj...@cam.ac.uk Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk On 1 Dec 2014, at 20:49, Mohamed Noor mohamed.n...@staffmail.ul.ie wrote: Dear crystallographers Is there any reason for using one data type over the other? Are there any errors associated with the French and Wilson I-to-F conversion step? Thanks. Mohamed
[ccp4bb] Intensities and amplitudes
Dear crystallographers Is there any reason for using one data type over the other? Are there any errors associated with the French and Wilson I-to-F conversion step? Thanks. Mohamed