Revisiting (and testing) an old question:

On 08/12/2003 02:38 PM, wgsc...@chemistry.ucsc.edu wrote:
***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***

On 08/12/2003 06:43 AM, Dirk Kostrewa wrote:

(1) you only need to take special care for choosing a test set if you _apply_
the NCS in your refinement, either as restraints or as constraints. If you
refine your NCS protomers without any NCS restraints/constraints, both your
protomers and your reflections will be independent, and thus no special care
for choosing a test set has to be taken

If your space group is P6 with only one molecule in the asymmetric unit but you instead choose the 
subgroup P3 in which to refine it, and you now have two molecules per asymmetric unit related by 
"local" symmetry to one another, but you don't apply it, does that mean that reflections 
that are the same (by symmetry) in P6 are uncorrelated in P3 unless you apply the "NCS"?

===================================================
The experiment described below  seems to show that Dirk's initial
statement was correct: even in the case where the "ncs" is actually
crystallographic, and the free set is chosen randomly, R-free is not
affected by how you pick the free set.  A structure is refined with
artificially low symmetry, so that a 2-fold crystallographic operator
becomes "NCS". Free reflections are picked either randomly (in which
case the great majority of free reflections are related by the NCS to
working reflections), or taking the lattice symmetry into account so
that symm-related pairs are either both free or both working. The final
R-factors are not significantly different, even with repeating each mode
10 times with independently selected free sets. They are also not
significantly different from the values obtained refining in the correct
space group, where there is no ncs.

Maybe this is not really surprising. Since symmetry-related reflections
have the same resolution, picking free reflections this way is one way
of picking them in (very) thin shells, and this has been reported not to
avoid bias: See Table 2 of Kleywegt and Brunger Structure 1996, Vol 4,
897-904. Also results of Chapman et al.(Acta Cryst. D62, 227–238). And see:
http://www.phenix-online.org/pipermail/phenixbb/2012-January/018259.html

But this is more significant: in cases of lattice symmetry like this,
the ncs takes working reflections directly onto free reflections. In the
case of true ncs the operator takes the reflection to a point between
neighboring reflections, which are closely coupled to that point by the
Rossmann G function. Some of these neighbors are outside the thin shell
(if the original reflection was inside; or vice versa), and thus defeat
the thin-shells strategy.  In our case the symm-related free reflection
is directly coupled to the working reflection by the ncs operator, and
its neighbors are no closer than the neighbors of the original
reflection, so if there is bias due to NCS it should be principally
through the sym-related reflection and not through its neighbors. And so
most of the bias should be eliminated by picking the free set in thin
shells or by lattice symmetry.

Also, since the "ncs" is really crystallographic, we have the control of
refining in the correct space group where there is no ncs. The R-factors
were not significantly different when the structure was refined in the
correct space group. (Although it could be argued that that leads to a
better structure, and the only reason the R-factors were the same is
that bias in the lower symmetry refinement resulted in lowering Rfree
to the same level.)

Just one example, but it is the first I tried- no cherry-picking. I
would be interested to know if anyone has an example where taking
lattice symmetry into account did make a difference.

For me the lack of effect is most simply explained by saying that, while
of course ncs-related reflections are correlated in their Fo's and Fc's,
and perhaps in in their |Fo-Fc|'s, I see no reason to expect that the
_changes_ in |Fo-Fc| produced by a step of refinement will be correlated
(I can expound on this). Therefore whatever refinement is doing to
improve the fit to working reflections is equally likely to improve or
worsen the fit to sym-related free reflections. In that case it is hard
to see how refinement against working reflections could bias their
symm-related free reflections.  (Then how does R-free work? Why does
R-free come down at all when you refine? Because of coupling to
neighboring working reflections by the G-function?)

Summary of results (details below):
0. structure 2CHR, I422, as reported in PDB, with 2-Sigma cutoff)
 R: 0.189          Rfree: 0.264  Nfree:442(5%)   Nrefl: 9087

1. The deposited 2chr (I422) was refined in that space group with the
original free set. No Sigma cutoff, 10 macrocycles.
 R: 0.1767         Rfree: 0.2403  Nfree:442(5%)   Nrefl: 9087

2. The deposited structure was refined in I422 10 times, 50 macrocycles
each, with randomly picked 10% free reflections
 R: 0.1725±0.0013   Rfree: 0.2507±0.0062  Nfree: 908.9±   Nrefl: 9087

3. The structure was expanded to an I4 dimer related by the unused I422
crystallographic operator, matching the dimer of 1chr. This dimer was
refined against the original (I4) data of 1chr, picking free reflections
in symmetry related pairs. This was repeated 10 times with different
random seed for picking reflections.
R: 0.1666±0.0012   **Rfree:0.2523±0.0077   Nfree: 1601.4  Nrefl:16011

4. same as 3 but picking free reflections randomly without regard for
lattice symmetry.
On average 15 free reflections were in pairs, 212 were invariant under
the operator (no sym-mate) and 1374 (86%) were paired with working
reflections.
R: 0.1674±0.0017   **Rfree:0.2523±0.0050  Nfree: 1600.9 Nrefl:16011

(**-Average Rfree almost identical by coincidence- the individual
results were all different)

Detailed results from the individual refinement runs are available in
spreadsheet in dropbox:
https://www.dropbox.com/s/fwk6q90xbc5r8n1/NCSbias.xls?dl=0
Scripts used in running the tests are also there in NCSbias.tgz:
https://www.dropbox.com/s/sul7a6hzd5krppw/NCSbias.tgz?dl=0

========================================

Methods:
I would like an experiment where relatively complete data is available
in the lower symmetry. To get something that is available to everyone, I
choose from the PDB. A good example is 2CHR, in space group I422, which
was originally solved and the data deposited in I4 with two molecules in
the asymmetric unit(structure 1CHR).

2CHR statistics from the PDB:
        R       R-free  complete  (Refined 8.0 to 3.0 A
        0.189   0.264   81.4      reported in PDB, with 2-Sig cutoff)
                                   Nfree=442  (4.86%)
Further refinement in phenix with same free set, no sigma cutoff:
 10 macrocycles bss, indiv XYZ, indiv ADP refinement; phenix default
 Resol 37.12 - 3.00 A 92.95% complete, Nrefl=9087 Nfree=442(4.86%)
 Start: r_work = 0.2097 r_free = 0.2503 bonds = 0.008 angles = 1.428
 Final: r_work = 0.1787 r_free = 0.2403 bonds = 0.011 angles = 1.284
  (2chr_orig_001.pdb,

The number of free reflections is small, so the uncertainty
in Rfree is large (a good case for Rcomplete)
Instead for better statistics, use new 10% free set and repeat 10 times;
50 macrocycles, with different random seeds:
 R: 0.1725±0.0013   Rfree: 0.2507±0.0062 bonds:0.010 Angles:1.192
  Nfree: 908.9±0.32   Nrefl: 9087

For artificially low symmetry, expand the I422 structure (making what I
call 3chr for convenience although I'm sure that ID has been taken):

pdbset xyzin 2CHR.pdb xyzout 3chr.pdb <<eof
exclude header
spacegroup I4
cell 111.890  111.890  148.490  90.00  90.00  90.00
symgen  X,Y,Z
symgen X,1-Y,1-Z
CHAIN SYMMETRY 2 A B
eof

Get the structure factors from 1CHR: 1chr-sf.cif
Run phenix.refine on 3chr.pdb with 1chr-sf.cif.
This file has no free set (deposited 1993) so tell phenix to generate
one. I don't want phenix to protect me from my own stupidity, so I use:
        generate = True
        use_lattice_symmetry = False
        use_dataman_shells = False
   (the .eff file with all non-default parameters is available as
3chr_rand_001.eff in the .tgz mentioned above)

For more significance, use the script multirefine.csh to repeat the refinement 
10 times with different random seed.After each run, grep significant results 
into a log file.


To check this gives free reflections related to working reflections, I
used mtz2various and a fortran prog (sortfree.f in .tgz) to separate the
data (3chr_rand_data.mtz) into two asymmetric units: h,k,l with h>k
(columns 4-5) and with h<k (col 6-7), listed the pairs, thusly:

mtz2various hklin 3chr_rand_data.mtz hklout temp.hkl <<eof
  LABIN FP=F-obs DUM1=R-free-flags
  OUTPUT USER '(3I4,2F10.5)'
eof
sortfree <<eof >sort3.hkl

sort3.hkl  looks like:
                 ______h>k______     ______h<k______
   h   k   l      F         free     F*         free*
   1   2   3    208.97      0.00    174.95      0.00
   1   2   5    226.85      0.00    191.65      0.00
   1   2   7    144.85      0.00    164.86      0.00
   1   2   9    251.26      0.00    261.71      0.00
   1   2  11    333.84      0.00    335.18      0.00
   1   2  13    800.37      0.00    791.77      0.00
   1   2  15    412.92      0.00    409.90      0.00
   1   2  17    306.99      0.00    317.53      0.00
   1   2  19    225.54      0.00    220.91      0.00
   1   2  21    101.20      1.00*   104.84      0.00
   1   2  23    156.27      0.00    156.49      0.00
   1   2  25    202.97      0.00    202.23      0.00
   1   2  27    216.10      0.00    219.28      0.00
   1   2  29    106.76      0.00    100.93      0.00
   1   2  31    157.32      0.00    154.37      1.00*
   1   2  33     71.84      0.00     20.78      0.00
   1   2  35    179.05      0.00    165.67      0.00
   1   2  37    254.04      0.00    239.96      1.00*
   1   2  39     69.56      0.00     30.61      0.00
   1   2  41     56.20      0.00     51.02      0.00

, and awked for 1 in the free columns. Out of 6922 pairs of reflections,
in one case:
674 in the first asu (h>k) are in the free set,
703 in the second asu (h<k) are in the free set
only 11 pairs have the reflections in both asu free.

out of 16011 refl in I4,
6922 pairs (=13844 refl), 1049 invariant (h=k or h=0), 1118 with absent mate.

out of 1601 free reflections:
On average 15 free reflections were in pairs, 212 were invariant under
the operator (no sym-mate) and 1374 (86%) were paired with working
reflections.

Then do 10 more runs of 50 macrocycles with:
   use_lattice_symmetry = False
   collecting the same statistics
(also scripted in multirefine.csh)

Finally, use ref2chr.eff to refine (as previously mentined) a monomer in I422 
(2chr.pdb) 10 times with 10% free, 50 macrocycles
(also scripted in multirefine.csh)

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

Reply via email to