Re: [ccp4bb] an over refined structure

Dirk Kostrewa Fri, 08 Feb 2008 00:15:36 -0800

Dear Dean and others,

Peter Zwart gave me a similar reply. This is very interestingdiscussion, and I would like to have a somewhat closer look to thisto maybe make things a little bit clearer (please, excuse the generalexplanations - this might be interesting for beginners as well):

1). Ccrystallographic symmetry can be applied to the whole crystaland results in symmetry-equivalent intensities in reciprocal space.If you refine your model in a lower space group, there will bereflections in the test-set that are symmetry-equivalent in thehigher space group to reflections in the working set. If you refinethe (symmetry-equivalent) copies in your crystal independently, theywill diverge due to resolution and data quality, and R-work and R-free will diverge to some extend due to this. If you force the copiesto be identical, the R-work & R-free will still be different due toobservational errors. In both cases, however, the R-free will be veryclose to the R-work.

2). In case of NCS, the continuous molecular transform will reflectthis internal symmetry, but because it is only a local symmetry, theobserved reflections sample the continuous transform at differentpoints and their corresponding intensities are generally different.It might, however, happen that a test-set reflection comes _very_close in reciprocal space to a "NCS-related" working-set reflection,and in such a case their intensities will be very similar and thiswill make the R-free closer to the R-work. If you do not apply NCS-averaging in form of restraints/constraints, these accidentally closereflections will be the only cases where R-free might be too close toR-work. If you apply NCS-averaging, then in real space you multiplythe electron density with a mask and average the NCS-related copieswithin this mask at all NCS-related positions. In reciprocal space,you then convolute the Fourier-transform of that mask with yourobserved intensities in all NCS-related positions. This will force tomake test-set reflections more similar to NCS-related working-setreflections and thus the R-free will be heavily based towards R-work.The range of this influence in reciprocal space can be approximatedby replacing the mask with a sphere and calculate the Fourier-transform of this sphere. This will give the so-called G-function,whose radius of the first zero-value determines its radius ofinfluence in reciprocal space.


To summarize:

(i) One can't directly compare crystallographic and non-crystallographic symmetry(ii) In case of NCS, I have to admit, that even if you do not applyNCS-restraints/constraints, there will be some effect on the R-freeby chance. So, my original statement was too strict in this respect.But only if you really apply NCS-restraints/constraints, you force tobias the R-free towards the R-work with an approximte radius of the G-function in reciprocal space.


What an interesting discussion!

Best regards,

Dirk.

Am 07.02.2008 um 18:57 schrieb Dean Madden:

Hi Dirk,
I disagree with your final sentence. Even if you don't apply NCSrestraints/constraints during refinement, there is a serious riskof NCS "contaminating" your Rfree. Consider the limiting case inwhich the "NCS" is produced simply by working in an artificiallylow symmetry space-group (e.g. P1, when the true symmetry is P2):in this case, putting one symmetry mate in the Rfree set, and onein the Rwork set will guarantee that Rfree tracks Rwork. The sameeffect applies to a large extent even if the NCS is notcrystallographic.
Bottom line: thin shells are not a perfect solution, but if NCS ispresent, choosing the free set randomly is *never* a better choice,and almost always significantly worse. Together with multicopyrefinement, randomly chosen test sets were almost certainly a majorcontributor to the spuriously good Rfree values associated with theretracted MsbA and EmrE structures.
Best wishes,
Dean

Dirk Kostrewa wrote:
Dear CCP4ers,
I'm not convinced, that thin shells are sufficient: I think, inprinciple, one should omit thick shells (greater than the diameterof the G-function of the molecule/assembly that is used todescribe NCS-interactions in reciprocal space), and use the innerthin layer of these thick shells, because only those should becompletely independent of any working set reflections. But thiswould be too "expensive" given the low number of observedreflections that one usually has ...However, if you don't apply NCS restraints/constraints, there isno need for any such precautions.
Best regards,
Dirk.
Am 07.02.2008 um 16:35 schrieb Doug Ohlendorf:
It is important when using NCS that the Rfree reflections beselected isdistributed thin resolution shells. That way application of NCSshould not
mix Rwork and Rfree sets.  Normal random selection or Rfree + NCS
(especially 4x or higher) will drive Rfree down unfairly.

Doug Ohlendorf

-----Original Message-----
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] OnBehalf Of
Eleanor Dodson
Sent: Tuesday, February 05, 2008 3:38 AM
To: [email protected] <mailto:[email protected]>
Subject: Re: [ccp4bb] an over refined structure
I agree that the difference in Rwork to Rfree is quite acceptableat your resolution. You cannot/ should not use Rfactors as acriteria for structure correctness.As Ian points out - choosing a different Rfree set of reflectionscan change Rfree a good deal.certain NCS operators can relate reflections exactly making ithard to get a truly independent Free R set, and there are otherreasons to make it a blunt edged tool.
The map is the best validator - are there blobs still not fitted?(maybe side chains you have placed wrongly..) Are there manypositive or negative peaks in the difference map? How well doesthe NCS match the 2 molecules?
etc etc.
Eleanor

George M. Sheldrick wrote:
Dear Sun,
If we take Ian's formula for the ratio of R(free) to R(work)from his paper Acta D56 (2000) 442-450 and make some reasonableapproximations,
we can reformulate it as:

R(free)/R(work) = sqrt[(1+Q)/(1-Q)]  with  Q = 0.025pd^3(1-s)
where s is the fractional solvent content, d is the resolution,p isthe effective number of parameters refined per atom afterallowing forthe restraints applied, d^3 means d cubed and sqrt means squareroot.
The difficult number to estimate is p. It would be 4 for anisotropic refinement without any restraints. I guess that p=1.5might be an appropriate value for a typical protein refinement(giving an R-factorratio of about 1.4 for s=0.6 and d=2.8). In that case, your R-factor ratio of 0.277/0.215 = 1.29 is well within the allowedrange!
However it should be added that this formula is almost a self-fulfilling prophesy. If we relax the geometric restraints we
increase p, which then leads to a larger 'allowed' R-factor ratio!

Best wishes, George


Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582
*******************************************************
Dirk Kostrewa
Gene Center, A 5.07
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:  +49-89-2180-76999
E-mail: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]muenchen.de>
*******************************************************
--
Dean R. Madden, Ph.D.
Department of Biochemistry
Dartmouth Medical School
7200 Vail Building
Hanover, NH 03755-3844 USA

tel: +1 (603) 650-1164
fax: +1 (603) 650-1128
e-mail: [EMAIL PROTECTED]



*******************************************************
Dirk Kostrewa
Gene Center, A 5.07
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:    +49-89-2180-76999
E-mail: [EMAIL PROTECTED]
*******************************************************

Re: [ccp4bb] an over refined structure

Reply via email to