Re: [ccp4bb] R too low?
You have obviously solved this problem, but one thing that can change apparent Rfactors is the number of reflections accepted.. If one gives you 5% more very weak reflections say, then those will inevitably have high Rfactors and this can increase the apparent Rfactor without changing the map appearance much.. Eleanor On 27 June 2013 21:26, Roberts, Sue A - (suer) s...@email.arizona.eduwrote: Hello Everone, Thanks for all the help. The key to finding the problem was following up on Tim Gruene's suggestion to compare the data sets directly. It appears that an error occurred during conversion from I to F - until I find the log file for the conversion, I can't guess what was done. Longer version: When I compared the good and bad data sets, R was about 0.15, instead of the 0.07 I was expecting. Yesterday, I reintegrated the images using the same program that generated the bad data (CrystalClear - sorry to be opaque but I didn't want to inspire a lot of discussion about various integration programs when I was pretty sure the program wasn't at fault.), and ended up with a data set that agreed with the good data (XDS). (Yeah, I should've done this before sending a message to ccp4bb). The R for scaling the new CC dataset and the XDS dataset was 0.07 and refinement behaved as expected and agreed with that of XDS. I have been unable to find the log file for the conversion from integrated I to mtz F (it's on some computer somewhere, I'm sure), but I did find the original ScalAveraged.ref file for the bad data and reimported that using the import scaled data task in ccp4i. That data set is also good. So, I conclude that something was done wrong during import to ccp4. Tim suggested that perhaps the data was converted twice to amplitudes, perhaps that's it. Anyway, now I know where the problem arose. Several people suggested checking statistics using phenix polygon and other analysis tools in phenix. I agree that those are nice tools (and we had done that), however, they only tell you how your statistics are different from the median and often don't give any hints as to how any problems might have arisen. Again, thanks for all the help. Sue On Jun 26, 2013, at 8:54 AM, Tim Gruene wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Sue, if you made your rmsd (bonds) 20-30 times smaller I would agree they were not too loose. 0.14A is pretty high. So two suggestions: a) check the molprobity report of your PDB if its geometry is sane b) check the CC plot of one data set against the other one to check if the problem is due to two different data or due to the PDB file (xprep can do this plot conveniently). Did you check if you converted the data twice to amplitudes, or maybe not at all? Best, Tim On 06/26/2013 05:44 PM, Roberts, Sue A - (suer) wrote: Hello Everyone I have two data sets, from the same crystal form (space group P32) of the same protein, collected at 100 K at SSRL, about 2.2 A resolution, that refining to R = 0.14, Rf = 0.26 (refmac/TLS). This is a molecular replacement solution, from a model with about 40% homology (after MR density was apparent for some missing or misbuilt residues, so I don't think the structure is stuck in the wrong place. The Fo-Fc map is essentially featureless. The 2Fo-Fc map doesn't look as good as it should - for instance, there are very few water molecules to be found. The data reduction statistics look OK, the resolution cutoff is pretty conservative. There is one molecule in the asymmetric unit, so no NCS. There is no twinning either. It seemed to me that the R is too low, not Rf too high. More normally, R ends up about .18 - .20 for a data set at this resolution. I reprocessed the images with a different data processing program and redid the MR. The data reduction statistics look similar, the resolution is the same, but now the structure refines to R = 0.20, Rf = 0.24 (same free R set of reflections chosen, still refmac/TLS.) The maps look more normal. Further rebuilding took us to R = 0.18, Rf = 0.22 So, the question I have (and that I've been asked by the student and PI) is: What was the problem with the original data set? What should I be looking for in the data reduction log files, for instance, or in the refinement log? The large R - free R spread is characteristic of overfitting, but the geometry is not too loose (rmsd bonds = 0.14), there are plenty of reflections (both working and free). Can anyone point me toward a reason R would be low? Thanks Sue Dr. Sue A. Roberts Dept. of Chemistry and Biochemistry University of Arizona 1041 E. Lowell St., Tucson, AZ 85721 Phone: 520 621 8171 or 520 621 4168 s...@email.arizona.edu http://www.cbc.arizona.edu/xray or http://www.cbc.arizona.edu/facilities/x-ray_diffraction - -- - -- Dr Tim Gruene Institut fuer anorganische
Re: [ccp4bb] R too low?
Hello Everone, Thanks for all the help. The key to finding the problem was following up on Tim Gruene's suggestion to compare the data sets directly. It appears that an error occurred during conversion from I to F - until I find the log file for the conversion, I can't guess what was done. Longer version: When I compared the good and bad data sets, R was about 0.15, instead of the 0.07 I was expecting. Yesterday, I reintegrated the images using the same program that generated the bad data (CrystalClear - sorry to be opaque but I didn't want to inspire a lot of discussion about various integration programs when I was pretty sure the program wasn't at fault.), and ended up with a data set that agreed with the good data (XDS). (Yeah, I should've done this before sending a message to ccp4bb). The R for scaling the new CC dataset and the XDS dataset was 0.07 and refinement behaved as expected and agreed with that of XDS. I have been unable to find the log file for the conversion from integrated I to mtz F (it's on some computer somewhere, I'm sure), but I did find the original ScalAveraged.ref file for the bad data and reimported that using the import scaled data task in ccp4i. That data set is also good. So, I conclude that something was done wrong during import to ccp4. Tim suggested that perhaps the data was converted twice to amplitudes, perhaps that's it. Anyway, now I know where the problem arose. Several people suggested checking statistics using phenix polygon and other analysis tools in phenix. I agree that those are nice tools (and we had done that), however, they only tell you how your statistics are different from the median and often don't give any hints as to how any problems might have arisen. Again, thanks for all the help. Sue On Jun 26, 2013, at 8:54 AM, Tim Gruene wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Sue, if you made your rmsd (bonds) 20-30 times smaller I would agree they were not too loose. 0.14A is pretty high. So two suggestions: a) check the molprobity report of your PDB if its geometry is sane b) check the CC plot of one data set against the other one to check if the problem is due to two different data or due to the PDB file (xprep can do this plot conveniently). Did you check if you converted the data twice to amplitudes, or maybe not at all? Best, Tim On 06/26/2013 05:44 PM, Roberts, Sue A - (suer) wrote: Hello Everyone I have two data sets, from the same crystal form (space group P32) of the same protein, collected at 100 K at SSRL, about 2.2 A resolution, that refining to R = 0.14, Rf = 0.26 (refmac/TLS). This is a molecular replacement solution, from a model with about 40% homology (after MR density was apparent for some missing or misbuilt residues, so I don't think the structure is stuck in the wrong place. The Fo-Fc map is essentially featureless. The 2Fo-Fc map doesn't look as good as it should - for instance, there are very few water molecules to be found. The data reduction statistics look OK, the resolution cutoff is pretty conservative. There is one molecule in the asymmetric unit, so no NCS. There is no twinning either. It seemed to me that the R is too low, not Rf too high. More normally, R ends up about .18 - .20 for a data set at this resolution. I reprocessed the images with a different data processing program and redid the MR. The data reduction statistics look similar, the resolution is the same, but now the structure refines to R = 0.20, Rf = 0.24 (same free R set of reflections chosen, still refmac/TLS.) The maps look more normal. Further rebuilding took us to R = 0.18, Rf = 0.22 So, the question I have (and that I've been asked by the student and PI) is: What was the problem with the original data set? What should I be looking for in the data reduction log files, for instance, or in the refinement log? The large R - free R spread is characteristic of overfitting, but the geometry is not too loose (rmsd bonds = 0.14), there are plenty of reflections (both working and free). Can anyone point me toward a reason R would be low? Thanks Sue Dr. Sue A. Roberts Dept. of Chemistry and Biochemistry University of Arizona 1041 E. Lowell St., Tucson, AZ 85721 Phone: 520 621 8171 or 520 621 4168 s...@email.arizona.edu http://www.cbc.arizona.edu/xray or http://www.cbc.arizona.edu/facilities/x-ray_diffraction - -- - -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFRyw6vUxlJ7aRr7hoRAq4HAKCJJf+FfRVT7u3UOrty0vTOFMN+mgCgtHz8 MYe+23hH+MKy/7E/h2w25+Q= =WAsD -END PGP SIGNATURE- Dr. Sue A. Roberts Dept. of Chemistry and Biochemistry University of Arizona 1041 E. Lowell St., Tucson, AZ 85721 Phone: 520 621
[ccp4bb] R too low?
Hello Everyone I have two data sets, from the same crystal form (space group P32) of the same protein, collected at 100 K at SSRL, about 2.2 A resolution, that refining to R = 0.14, Rf = 0.26 (refmac/TLS). This is a molecular replacement solution, from a model with about 40% homology (after MR density was apparent for some missing or misbuilt residues, so I don't think the structure is stuck in the wrong place. The Fo-Fc map is essentially featureless. The 2Fo-Fc map doesn't look as good as it should - for instance, there are very few water molecules to be found. The data reduction statistics look OK, the resolution cutoff is pretty conservative. There is one molecule in the asymmetric unit, so no NCS. There is no twinning either. It seemed to me that the R is too low, not Rf too high. More normally, R ends up about .18 - .20 for a data set at this resolution. I reprocessed the images with a different data processing program and redid the MR. The data reduction statistics look similar, the resolution is the same, but now the structure refines to R = 0.20, Rf = 0.24 (same free R set of reflections chosen, still refmac/TLS.) The maps look more normal. Further rebuilding took us to R = 0.18, Rf = 0.22 So, the question I have (and that I've been asked by the student and PI) is: What was the problem with the original data set? What should I be looking for in the data reduction log files, for instance, or in the refinement log? The large R - free R spread is characteristic of overfitting, but the geometry is not too loose (rmsd bonds = 0.14), there are plenty of reflections (both working and free). Can anyone point me toward a reason R would be low? Thanks Sue Dr. Sue A. Roberts Dept. of Chemistry and Biochemistry University of Arizona 1041 E. Lowell St., Tucson, AZ 85721 Phone: 520 621 8171 or 520 621 4168 s...@email.arizona.edu http://www.cbc.arizona.edu/xray or http://www.cbc.arizona.edu/facilities/x-ray_diffraction
Re: [ccp4bb] R too low?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Sue, if you made your rmsd (bonds) 20-30 times smaller I would agree they were not too loose. 0.14A is pretty high. So two suggestions: a) check the molprobity report of your PDB if its geometry is sane b) check the CC plot of one data set against the other one to check if the problem is due to two different data or due to the PDB file (xprep can do this plot conveniently). Did you check if you converted the data twice to amplitudes, or maybe not at all? Best, Tim On 06/26/2013 05:44 PM, Roberts, Sue A - (suer) wrote: Hello Everyone I have two data sets, from the same crystal form (space group P32) of the same protein, collected at 100 K at SSRL, about 2.2 A resolution, that refining to R = 0.14, Rf = 0.26 (refmac/TLS). This is a molecular replacement solution, from a model with about 40% homology (after MR density was apparent for some missing or misbuilt residues, so I don't think the structure is stuck in the wrong place. The Fo-Fc map is essentially featureless. The 2Fo-Fc map doesn't look as good as it should - for instance, there are very few water molecules to be found. The data reduction statistics look OK, the resolution cutoff is pretty conservative. There is one molecule in the asymmetric unit, so no NCS. There is no twinning either. It seemed to me that the R is too low, not Rf too high. More normally, R ends up about .18 - .20 for a data set at this resolution. I reprocessed the images with a different data processing program and redid the MR. The data reduction statistics look similar, the resolution is the same, but now the structure refines to R = 0.20, Rf = 0.24 (same free R set of reflections chosen, still refmac/TLS.) The maps look more normal. Further rebuilding took us to R = 0.18, Rf = 0.22 So, the question I have (and that I've been asked by the student and PI) is: What was the problem with the original data set? What should I be looking for in the data reduction log files, for instance, or in the refinement log? The large R - free R spread is characteristic of overfitting, but the geometry is not too loose (rmsd bonds = 0.14), there are plenty of reflections (both working and free). Can anyone point me toward a reason R would be low? Thanks Sue Dr. Sue A. Roberts Dept. of Chemistry and Biochemistry University of Arizona 1041 E. Lowell St., Tucson, AZ 85721 Phone: 520 621 8171 or 520 621 4168 s...@email.arizona.edu http://www.cbc.arizona.edu/xray or http://www.cbc.arizona.edu/facilities/x-ray_diffraction - -- - -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFRyw6vUxlJ7aRr7hoRAq4HAKCJJf+FfRVT7u3UOrty0vTOFMN+mgCgtHz8 MYe+23hH+MKy/7E/h2w25+Q= =WAsD -END PGP SIGNATURE-
Re: [ccp4bb] R too low?
HI Sue, Can you give rmsZ for the bond and angles (from the Refmac output)? I never could figure these rmsd values out... I'm guessing that the restraint are too loose, or at least not optimal. Perhaps, they went overboard with the TLS as well (sometimes fewer TLS goups give much better R and R-free values). I'm not sure anything in particular is wrong with the data processing. They should optimize the restraint weights in refinement first. In this case tighter B-factor restraint weights might do the trick. Gratuitous plug: throw the model and data into PDB_REDO (which uses Refmac too) and see if it gives better refinement results. Cheers, Robbie -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Roberts, Sue A - (suer) Sent: Wednesday, June 26, 2013 17:45 To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] R too low? Hello Everyone I have two data sets, from the same crystal form (space group P32) of the same protein, collected at 100 K at SSRL, about 2.2 A resolution, that refining to R = 0.14, Rf = 0.26 (refmac/TLS). This is a molecular replacement solution, from a model with about 40% homology (after MR density was apparent for some missing or misbuilt residues, so I don't think the structure is stuck in the wrong place. The Fo-Fc map is essentially featureless. The 2Fo-Fc map doesn't look as good as it should - for instance, there are very few water molecules to be found. The data reduction statistics look OK, the resolution cutoff is pretty conservative. There is one molecule in the asymmetric unit, so no NCS. There is no twinning either. It seemed to me that the R is too low, not Rf too high. More normally, R ends up about .18 - .20 for a data set at this resolution. I reprocessed the images with a different data processing program and redid the MR. The data reduction statistics look similar, the resolution is the same, but now the structure refines to R = 0.20, Rf = 0.24 (same free R set of reflections chosen, still refmac/TLS.) The maps look more normal. Further rebuilding took us to R = 0.18, Rf = 0.22 So, the question I have (and that I've been asked by the student and PI) is: What was the problem with the original data set? What should I be looking for in the data reduction log files, for instance, or in the refinement log? The large R - free R spread is characteristic of overfitting, but the geometry is not too loose (rmsd bonds = 0.14), there are plenty of reflections (both working and free). Can anyone point me toward a reason R would be low? Thanks Sue Dr. Sue A. Roberts Dept. of Chemistry and Biochemistry University of Arizona 1041 E. Lowell St., Tucson, AZ 85721 Phone: 520 621 8171 or 520 621 4168 s...@email.arizona.edu http://www.cbc.arizona.edu/xray or http://www.cbc.arizona.edu/facilities/x-ray_diffraction