On 06/05/2019 10:07 AM, Randy Read wrote:
Dear Ian,

I think the missing ingredient in your argument is an assumption that may be 
implicit in what others have written: if you have NCS in your crystal, you 
should be restraining that NCS in your model.  If you do that, then the 
NCS-related Fcalcs will be similar (especially in the particularly problematic 
case where the NCS is nearly crystallographic), and if the working reflections 
are over-fit to match the Fobs values, then the free reflections that are 
related by the same NCS will also be overfit.  So the measurement errors don't 
have to be correlated, just the modelling errors.


Randy,
"overfit" is a rather vague term, at least for me. I would prefer to consider 
definite quantities, like reduction in |Fo-Fc| of a free reflection as a result of 
refining against a quasi-sym-related working reflection (quasi because in cases of real 
ncs the operator does not directly elate reflections).
If (as Ian is assuming) the errors in Fobs are random, and IFF that implies 
that (Fo-Fc) are uncorrelated, then it wouldn't matter that the changes in Fc 
are correlated:

Say the model is pretty close but error in Fobs makes Fobs greater than Fc for the 
working reflection. Refining against the working reflections will tend to change the 
structure in a way that inappropriately increases Fcalc for the working reflection to 
more closely match the erroneously high Fobs ("fitting the noise"). And if we 
are constraining symmetry, it will equally increase Fcalc for the sym-related free 
reflection. But will this increase or decrease Rfree?

If Fobs for the free reflection is too low due to the random error, Fo-Fc for 
it will be negative, and increasing Fc will make it greater, _increasing_ Rfree.

To get ncs bias, you need BOTH ncs-correlation in the dFc's from a step of 
refinement and in the (Fo-Fc) values. If either of these fails, there is no 
explanation for NCS-bias. (And since no counter-examples have been brought 
forward, and the results of Jonathans's experiments compliment those of mine 
nicely, there doesn't seem to be any such phenomenon to be explained! There 
seems to be no evidence for ncs bias, at least when ncs is not restrained, 
which is what Dirk Kostrewa was maintaining.

Best wishes,

Randy

On 5 Jun 2019, at 13:58, Ian Tickle <ianj...@gmail.com 
<mailto:ianj...@gmail.com>> wrote:


Hi Jon

Sorry I didn't intend for my response to be interpreted as saying that anyone 
has suggested directly that the measurement errors of NCS-related reflection 
amplitudes are correlated.  In fact the opposite is almost certainly true since 
the only obvious way in practice that errors in Fobs could be correlated is via 
errors in the batch scale factors which would introduce correlations between 
errors in Fobs for reflections in the same or adjacent images, but that has 
nothing to do with NCS.  That's the 'elephant in the room': no-one has 
suggested that reflections on the same or adjacent images should not be split 
between the working and test sets, yet that's easily the biggest contributor to 
CV bias with or without NCS!  I think taking that effect into account would be 
much more productive than worrying about NCS, but performing the test-set 
sampling in shells can't possibly address that, since the images obviously cut 
across all shells.

The point I was making was that correlation of errors in NCS-related Fobs would 
appear to be the inevitable _implication_ of what certainly has been claimed, 
namely that NCS can introduce bias into CV statistics if the test-set sampling 
is not done correctly, i.e. by splitting NCS-related Fobs between the working 
and test sets.  Unless there's something I've missed that's the only possible 
explanation for that claim.  This is because overfitting results from fitting 
the model to the errors in Fobs, and the CV bias arises from correlation of 
those errors if the NCS-related Fobs are split up, thus causing the degree of 
overfitting to be underestimated and giving a too-rosy picture of the structure 
quality.  Indeed you seem to be saying that because the NCS-related Fobs are 
correlated (a patently true statement), then it follows that the errors in 
those Fobs are also correlated, or at least no more correlated than for 
non-NCS-related Fobs, but I just don't see how that can be
true.

Rfree is not unbiased: as a measure of the agreement it is biased upwards by 
overfitting (otherwise how could it be used to detect overfitting?), by failing 
to fit with the uncorrelated errors in the test-set Fobs, just as Rwork is 
biased downwards by fitting to the errors in the working-set Fobs.  Overfitting 
becomes immediately apparent whenever you perform any refinement, so the only 
point at which there is no overfitting is for the initial model when Rwork and 
Rfree are equal, apart from a small difference arising from random sampling of 
the test-set (that sampling error could be reduced by performing refinements 
with all 20 working/test sets combinations and averaging the R values).  From 
there on the 'gap' between Rwork and Rfree is a measure of the degree of 
overfitting, so we should really be taking some average of Rwork and Rfree as 
the true measure of agreement (though the biases are not exactly equal and 
opposite so it's not a simple arithmetic mean).  The goal
of choosing the appropriate refinement parameters, restraints and weights is to 
_minimise_ overfitting, not eliminate it.  It is not possible to eliminate it 
completely: if it were then Rwork and Rfree would become equal (apart from that 
small effect from random sampling).

I don't follow your argument about correlation of Fobs from NCS.  Overfitting, and 
therefore CV bias, arises from the _errors_ in the Fobs not from the Fobs themselves, and 
there's no reason to believe that the Fobs should be correlated with their errors.  You 
say "any correlation between the test-set and the working-set F's due to NCS would 
be expected to reduce R-free".  If the working and test sets are correlated by NCS 
that would mean that Rwork is correlated with Rfree so they would be reduced equally!  
There are two components of the Fobs - Fcalc difference: Fcalc - Ftrue (the model error) 
and Fobs - Ftrue (the data error).  The former is completely correlated between the 
working and test sets (obviously since it's the same model) so what you do to one you 
must do to the other.  The latter can only be correlated by NCS if NCS has an effect on 
errors in the Fobs, which it doesn't, or by some other effect such as errors in batch 
scales that are unrelated to NCS.

Overfitting is related to the data/parameter ratio so you don't observe the 
effects of overfitting until you change the model, the parameter set or the 
restraints.  If there were no errors there would be no overfitting and no CV 
bias (actually there would be no need for cross-validation!).

Of course as you say, your tests suggest that there is no CV bias from NCS, in 
which case there's absolutely nothing to explain!

Cheers

-- Ian


On Tue, 4 Jun 2019 at 21:33, Jonathan Cooper 
<00000c2488af9525-dmarc-requ...@jiscmail.ac.uk 
<mailto:00000c2488af9525-dmarc-requ...@jiscmail.ac.uk>> wrote:

    Ian, statistics is not my forte, but I don't think anyone is suggesting 
that the measurement errors of NCS-related reflection amplitudes are 
correlated. In simple terms, since NCS-related F's should be correlated, the 
working-set reflection amplitudes could be correlated with those in the 
test-set, if the latter is chosen randomly, rather than in shells. Am I right 
in saying that R-free not just indicates over-fitting but, also, acts as an 
unbiased measure of the agreement between Fo and Fc? During a well-behaved 
refinement run, in the cycles before any over-fitting becomes apparent, the 
decrease in R-free value will indicate that the changes being made to the model 
are making it more consistent with Fo's. In these stages, any correlation 
between the test-set and the working-set F's due to NCS would be expected to 
affect the R-free (cross-validation bias), making it lower than it would be if 
the test set had been chosen in resolution shells? However, you are always
    right and, as you know, I failed to detect any such effect in my limited 
tests. Thanks to you and others for replying.


    On Tuesday, 4 June 2019, 02:07:10 BST, Edward A. Berry <ber...@upstate.edu 
<mailto:ber...@upstate.edu>> wrote:


    On 05/19/2019 08:21 AM, Ian Tickle wrote:
    ~~~
    >> So there you have it: what matters is that the _errors_ in the 
NCS-related amplitudes are uncorrelated, or at least no more correlated than the 
errors in the non-NCS-related amplitudes, NOT the amplitudes themselves.

    Thanks, Ian!

    I would like to think that it is the errors in Fobs that matter (as may be 
the case), because then:
    1. ncs would not bias R-free even if you _do_ use ncs 
constraints/restraints. (changes in Fcalc due to a step of refinement would be 
positively correlated between sym-mates, but if the sign of (Fo-Fc) is opposite 
at the sym-mate, what impoves the working reflection would worsen the free)
    2. There would be no need to use the same free set when you refine the 
structure against a new dataset (as for ligand studies) since the random errors 
of measurement in Fobs in the two sets would be unrelated.

    However when I suggested that in a previous post, I was reminded that errors in Fobs 
account for only a small part of the difference (Fo-Fc). The remainder must be due to 
inability of our simple atomic models to represent the actual electron density, or its 
diffraction; and for a symmetric structure and a symmetric model, that difference is 
likely to be symmetric.  Whether that difference represents "noise" that we 
want to avoid fitting is another question, but it is likely that (Fo-Fc) will be 
correlated with sym-mates. So I settled for convincing myself that the changes in Fc 
brought about by refinement would be uncorrelated, and thus the _changes_ in (Fo-Fc) at 
each step would be uncorrelated.

    Below are some of the ideas I come up with in trying to think about this, 
and about bias in general. (Not very well organized and not the best of prose, 
but if one is a glutton for punishment, or just wants to see how the mind of a 
madman works . . .)

    Warning- some of this is contrary to current consensus opinion and the 
conclusions may be, in the words of a popular autobuilding program, partly 
WRONG!  In particular, the idea that coupling by the G-function does not bias 
R-free, but rather is the only reason that R-free works at all!
    - - - - - - - - - -

    The differences (Fo-Fc) can be divided between (1) errors in measurement
    of reflection intensities and (2)failure of the model to represent the
    true structure. The first can be considered "noise" and we would expect
    it to be random, with no correlation between symm mates.
    However most of the difference between Fc and Fobs is not due to random
    noise in the data, but to failures of our model to accurately represent
    the real thing. These differences are likely to be ncs-symmetric.
    Leaving aside the question of whether or not we want to fit this kind of
    "noise" (bringing the model closer to the real structure?), we conclude
    that (Fo-Fc) is likely to be correlated between ncs-mates.

    But for refinement against the working set to bias the contribution of
    sym-related free-set reflections to R-free would require that _changes_
    in |Fo-Fc| from a step of refinement would be ncs-correlated. If on the
    contrary they are not correlated, i.e. if a change that decreases
    |Fo-Fc| for a working reflection is equally likely to decrease or
    increase |Fo-Fc| for its sym mate (which may be) in the free set, then
    it is hard to see how refinement against the working reflection would
    bias R-free.

    Under what conditins would |Fo-Fc| for symmetry related reflections be
    correlated? This would be the case if change in Fc correlates AND the
    sign of (Fo-Fc) correlates. Again, if the difference were only due to
    random error in Fobs, then the sign of Fo-Fc of a symmetry related 
reflection
    would be as likely to be the opposite as the same (as the original
    reflection) so even if changes in Fc are correlated, what improves the
    fit to the original reflection would be as likely to worsen the fit to
    its mate. But we concluded above that Fo-Fc is likely to be correlated
    by symmetry, since the shortcomings of our model are likely to be
    symmetric. So we ask if changes in Fc are correlated.

    So why should a structural change result in correlated changes of
    symm-related Fc's?
    The Fc is the amplitude of the best-fit sin wave (of the specified
    frequency) to the projection of the density of the crystal onto the
    specified scattering vector. The refinement program can increase Fcalc
    by moving an atom so that its projection on the scattering vector moves
    toward a peak of that sine wave, or decrease it by moving away from a peak.
    If the projection of an atom on the scattering vector moves toward a
    peak, the density becomes more peaked and the amplitude increases, if it
    moves toward a trough it tends to take density away from the peak or
    fill in the trough and the density becomes flatter.

    But the scattering vector of a sym-related reflection is at a different
    angle, anywhere from almost 0 to 90 degrees from its mate (actually to
    180*, but then the Friedel mate is close to zero- Its a question of how
    parallel they are, irrespective of direction). The atom we are changing
    will fall at a different position along the rotated scattering vector,
    and its movement may be toward a peak or trough of the projected density
    on that scattering vector.

    If the two reflections are close in reciprocal space, their scattering
    vectors will be nearly colinear, the projection of density onto them
    will be similar, and the projection of the atom being moved onto them
    will come at a similar position in these projections. In that case
    moving density so that its projection on one scattering vector moves
    toward or away from a peak of its best-fit sine wave will have a similar
    effect for the adjacent reflection, and their changes will be correlated.

    But if the reflections are not close in reciprocal space, their
    scattering vectors are at different angles, the projection of the
    density on them looks quite different, and the projection of the atom
    being moved comes at a different position. In this case it is impossible
    to predict how changes in the two reflections' amplitudes due to
    movement of an atom will correlate without knowing the details of the
    density.

    For symmetry-related reflections, the projection of density of the
    rotated protomer on the scattering vector of the rotated reflection will
    be the same as the projection of the density of the original protomer on
    the original reflection (hence the correlation of Fc). (in case the
    symmetry is actually crystallographic, as in our case, then the
    projection of the entire crystal on the rotated scattering vector will
    be the same as its projection on the original reflection's scattering
    vector). But the change we are making is only in the original protomer,
    not in its symm mate, and so its projection will fall at a different
    point along the rotated scattering vector, so whether it moves density
    toward a peak or trough is somewhat random.

    If ncs is restrained or constrained, the changes will
    also follow ncs-symmetry and so changes in Fc would be expected to be
    symmetric.

    I have extensive experiments, again with the same 2CHR structure
    refining with I4 symmetry, showing that when you introduce a change in
    the structure by random shaking or molecular dynamics, the correlation
    between changes in Fc for "ncs" symmetry related atoms is close to zero,
    and occasionally negative. The slight positive average correlation may be
    attributed to sym-pairs that are close in reciprocal space (like 1,0,30
    and -1,0,30 if there were a 2-fold along 0,0,l) so that they are coupled
    not by ncs but by the G-function. Granted changes due to shaking might
    not be the same as changes due to refinement, but these were shaken
    starting from the refined position, and I assume that if they were refined
    from this randomly shaken position they would go back to the original
    refined position, in which case the Fc changes due to refinement would
    be equally uncorrelated.

    ----------

    Coupling between reflections by the G function-
    Without saying exactly what is meant by couplings, reflections can be
    coupled in two ways. One, reflections are coupled to other reflections near
    them in reciprocal space. This is due to the fact that the molecular
    transform of the molecule is relatively smooth (due to the molecular 
transform being oversampled due to the asymmetric unit being larger than the 
structure contained?), so values of amplitude and
    phase for a reflection cannot differ too widely from those of neighboring
    reflections. Or because the scattering vectors of neighboring reflections 
are nearly parallel and similar in frequency so the projection of the density 
on them integrates similarly.
    (second is ncs-coupling)

    In general coupling of neighboring reflns is a good thing for crystallography. No one 
reflection is indispensable, because its information is much the same as the other reflections in a 
cube of 26 surrounding reflections. This allows us to solve structures when the data is only 80-90% 
complete, provided the missing reflections are randomly scattered among the present reflections. It 
supports the "fill-in" fft map procedure where FcΦc is used for missing reflections (the 
structure based on surrounding reflectins will be good enough to give a good estimate of the 
missing structure factor). It makes possible resolution extension during density modification or by 
the "free lunch" procedures of Dodson and Sheldrick .

    And I would argue that this coupling is what makes cross-validation 
(free-R) work. We say
    that refining against the working reflections improves the structure, making it more 
like the true structure, and thus the free Fc approach their Fobs. But not because the 
good fairy looks at the structure and says "OK, Its improved now, we can lower the 
R-free".
    How does it work mathematically? If the reflections were completely independent, if 
free and working reflections were not coupled through being samples of the same molecular 
transform, then changes which improve the fit to the working reflections would have no 
effect on the values of the free reflections.  It has to go through the structure, 
changes due to refining against the working reflections affect the free reflections, 
which we can call "coupling", and we know that is described by the G-function. 
If free reflections were not coupled to working reflections, Rfree would never change and 
thus would be useless.

    For an example, suppose we refine the position of an atom, choosing working 
reflections only in the plane l=0, and free reflections along the l axis (assuming an 
orthorhombic system). The working reflections are only sensitive to position in the x and 
y directions, so the z position would be unchanged by the refinement. But the free 
reflections are only sensitive to position along the z axis, so R-free would be 
unchanged. Presumably the structure would be improved (if that one atom was slightly 
misplaced and all other atoms correctly placed), but the Rfee would not improve. I would 
say this is the direction Chapman and co. were heading with their thin shells of free 
reflections isolated by thick shells of unused guard reflections. If they really succeed 
in eliminating the "bias", then Rfree will be unresponsive to refinement and so 
useless.

    Al. et Chapman considered two kinds of coupling- that due to ncs and
    direct coupling via Rossmann's G function. They found that choosing free set
    in thin shells had little effect, in fact very thick shells with the
    test reflections centered in the middle of the shell were required to
    significantly reduce the "bias". Now the reciprocal space equivalent of
    ncs operators are pure rotational operators, so they relate points in
    reciprocal space with precisely the same resolution. Selecting free
    reflections in thin shells should thus be sufficient to ensure that
    ncs-related reflections have the same free-R flag and avoid bias.  For
    my case where ncs is really crystallographic, the shells could be
    infinitely thin since the symm-related reflections have precisely the
    same resolution. For real ncs the operator takes a reflection to a
    non-bragg position which is closely surrounded by reflections, coupled
    to them by the G function.
    In that case somewhat thicker shells would be required. But using very
    thick guard zones around the free reflections implies it is the
    G-function they are fighting, as they somewhat implicitly acknowledged by 
the
    discussion of thickness of shells in terms of the radius of the central 
maximum
    of the G function. In that case I wonder if ncs-coupling which still has
    to go through G-function coupling to bias a free reflection
    contributes significantly compared to the coupling of every reflection to
    its direct neighbors.

    By using thick guard zones of unused reflections, they end up refining with 
very incomplete data which would be expected to affect the refinement and raise 
the R-free just because the structure is less correct. They control for this by 
refining with another set in which the same number of reflections are deleted 
randomly. But this is not a satisfactory control, because it is generally 
agreed that missing reflections due to an empty zone in reciprocal space is 
more deleterious than missing reflections that are randomly scattered.
    Ironically this same "redundancy due to oversampling" that Chapman and co. 
discuss in their introduction allows neighboring reflections to impart most of the 
information of an isolated absent reflection. When the missing reflections are clustered 
together in a thick shell or wedge, a lot of information is not available and the 
structure will suffer. And in particular the structural details that determine structure 
factors in the center of the excluded zone will be poorly determined, since information 
pertaining to them is being excluded. So of course the R-factor calculated from these 
reflections will be higher than with randomly absent data.  Furthermore, if G-function is 
the vehicle by which R-free follows R, R-free will follow less closely and hence 
under-report what improvement is being made.






    >
    > On Sun, 19 May 2019 at 04:34, Edward A. Berry <ber...@upstate.edu 
<mailto:ber...@upstate.edu> <mailto:ber...@upstate.edu <mailto:ber...@upstate.edu>>> 
wrote:
    >
    >    Revisiting (and testing) an old question:
    >
    >    On 08/12/2003 02:38 PM, wgsc...@chemistry.ucsc.edu 
<mailto:wgsc...@chemistry.ucsc.edu> <mailto:wgsc...@chemistry.ucsc.edu 
<mailto:wgsc...@chemistry.ucsc.edu>> wrote:
    >      > ***  For details on how to be removed from this list visit the  ***
    >      > ***          CCP4 home page http://www.ccp4.ac.uk 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ccp4.ac.uk_&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=8a9HFH1BwjBbLxzg7EcUXBf0-isZOOGqa53sqlRR3EY&e=><https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ccp4.ac.uk&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=8QKUnHluH3BoqVGBCJIBrwzvKcMXJj0FA7ubqWWpqYo&e=>
        ***
    >
    >      > On 08/12/2003 06:43 AM, Dirk Kostrewa wrote:
    >      >>
    >      >> (1) you only need to take special care for choosing a test set if 
you _apply_
    >      >> the NCS in your refinement, either as restraints or as 
constraints. If you
    >      >> refine your NCS protomers without any NCS restraints/constraints, 
both your
    >      >> protomers and your reflections will be independent, and thus no 
special care
    >      >> for choosing a test set has to be taken
    >      >
    >      > If your space group is P6 with only one molecule in the asymmetric unit but you 
instead choose the subgroup P3 in which to refine it, and you now have two molecules per asymmetric unit 
related by "local" symmetry to one another, but you don't apply it, does that mean that 
reflections that are the same (by symmetry) in P6 are uncorrelated in P3 unless you apply the 
"NCS"?
    >
    >    ===================================================
    >    The experiment described below  seems to show that Dirk's initial
    >    statement was correct: even in the case where the "ncs" is actually
    >    crystallographic, and the free set is chosen randomly, R-free is not
    >    affected by how you pick the free set.  A structure is refined with
    >    artificially low symmetry, so that a 2-fold crystallographic operator
    >    becomes "NCS". Free reflections are picked either randomly (in which
    >    case the great majority of free reflections are related by the NCS to
    >    working reflections), or taking the lattice symmetry into account so
    >    that symm-related pairs are either both free or both working. The final
    >    R-factors are not significantly different, even with repeating each 
mode
    >    10 times with independently selected free sets. They are also not
    >    significantly different from the values obtained refining in the 
correct
    >    space group, where there is no ncs.
    >
    >    Maybe this is not really surprising. Since symmetry-related reflections
    >    have the same resolution, picking free reflections this way is one way
    >    of picking them in (very) thin shells, and this has been reported not 
to
    >    avoid bias: See Table 2 of Kleywegt and Brunger Structure 1996, Vol 4,
    >    897-904. Also results of Chapman et al.(Acta Cryst. D62, 227–238). And 
see:
    > http://www.phenix-online.org/pipermail/phenixbb/2012-January/018259.html 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.phenix-2Donline.org_pipermail_phenixbb_2012-2DJanuary_018259.html&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=-HVJbT7G2pECBs6z3G3jXq5GwwpAmpgam_rivJb3yts&e=><https://urldefense.proofpoint.com/v2/url?u=http-3A__www.phenix-2Donline.org_pipermail_phenixbb_2012-2DJanuary_018259.html&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=9oRDhpFat0zQ7aXSW2pTyPmPQdn9Bq0AZ0KorlSXsVI&e=>
    >
    >    But this is more significant: in cases of lattice symmetry like this,
    >    the ncs takes working reflections directly onto free reflections. In 
the
    >    case of true ncs the operator takes the reflection to a point between
    >    neighboring reflections, which are closely coupled to that point by the
    >    Rossmann G function. Some of these neighbors are outside the thin shell
    >    (if the original reflection was inside; or vice versa), and thus defeat
    >    the thin-shells strategy.  In our case the symm-related free reflection
    >    is directly coupled to the working reflection by the ncs operator, and
    >    its neighbors are no closer than the neighbors of the original
    >    reflection, so if there is bias due to NCS it should be principally
    >    through the sym-related reflection and not through its neighbors. And 
so
    >    most of the bias should be eliminated by picking the free set in thin
    >    shells or by lattice symmetry.
    >
    >    Also, since the "ncs" is really crystallographic, we have the control 
of
    >    refining in the correct space group where there is no ncs. The 
R-factors
    >    were not significantly different when the structure was refined in the
    >    correct space group. (Although it could be argued that that leads to a
    >    better structure, and the only reason the R-factors were the same is
    >    that bias in the lower symmetry refinement resulted in lowering Rfree
    >    to the same level.)
    >
    >    Just one example, but it is the first I tried- no cherry-picking. I
    >    would be interested to know if anyone has an example where taking
    >    lattice symmetry into account did make a difference.
    >
    >    For me the lack of effect is most simply explained by saying that, 
while
    >    of course ncs-related reflections are correlated in their Fo's and 
Fc's,
    >    and perhaps in in their |Fo-Fc|'s, I see no reason to expect that the
    >    _changes_ in |Fo-Fc| produced by a step of refinement will be 
correlated
    >    (I can expound on this). Therefore whatever refinement is doing to
    >    improve the fit to working reflections is equally likely to improve or
    >    worsen the fit to sym-related free reflections. In that case it is hard
    >    to see how refinement against working reflections could bias their
    >    symm-related free reflections.  (Then how does R-free work? Why does
    >    R-free come down at all when you refine? Because of coupling to
    >    neighboring working reflections by the G-function?)
    >
    >    Summary of results (details below):
    >    0. structure 2CHR, I422, as reported in PDB, with 2-Sigma cutoff)
    >        R: 0.189          Rfree: 0.264  Nfree:442(5%)  Nrefl: 9087
    >
    >    1. The deposited 2chr (I422) was refined in that space group with the
    >    original free set. No Sigma cutoff, 10 macrocycles.
    >        R: 0.1767        Rfree: 0.2403  Nfree:442(5%)  Nrefl: 9087
    >
    >    2. The deposited structure was refined in I422 10 times, 50 macrocycles
    >    each, with randomly picked 10% free reflections
    >        R: 0.1725±0.0013  Rfree: 0.2507±0.0062  Nfree: 908.9±  Nrefl: 9087
    >
    >    3. The structure was expanded to an I4 dimer related by the unused I422
    >    crystallographic operator, matching the dimer of 1chr. This dimer was
    >    refined against the original (I4) data of 1chr, picking free 
reflections
    >    in symmetry related pairs. This was repeated 10 times with different
    >    random seed for picking reflections.
    >    R: 0.1666±0.0012  **Rfree:0.2523±0.0077  Nfree: 1601.4  Nrefl:16011
    >
    >    4. same as 3 but picking free reflections randomly without regard for
    >    lattice symmetry.
    >    On average 15 free reflections were in pairs, 212 were invariant under
    >    the operator (no sym-mate) and 1374 (86%) were paired with working
    >    reflections.
    >    R: 0.1674±0.0017  **Rfree:0.2523±0.0050  Nfree: 1600.9 Nrefl:16011
    >
    >    (**-Average Rfree almost identical by coincidence- the individual
    >    results were all different)
    >
    >    Detailed results from the individual refinement runs are available in
    >    spreadsheet in dropbox:
    > https://www.dropbox.com/s/fwk6q90xbc5r8n1/NCSbias.xls?dl=0 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_fwk6q90xbc5r8n1_NCSbias.xls-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ECmOQpcQpH7mncbvn_A1uTKIs3k_iV9n0jIAKXNYMEQ&e=><https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_fwk6q90xbc5r8n1_NCSbias.xls-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=xjmRlh84Tgcz_o3E3OzRlzo5uEaF92jfvm39eskwksQ&e=>
    >    Scripts used in running the tests are also there in NCSbias.tgz:
    > https://www.dropbox.com/s/sul7a6hzd5krppw/NCSbias.tgz?dl=0 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_sul7a6hzd5krppw_NCSbias.tgz-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=7Fjus1vJzmez6pdFctqgUnwdktmS9OE5sIuWekvdbnQ&e=><https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_sul7a6hzd5krppw_NCSbias.tgz-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=rTs7C-Kah1oWzzdHbYI8K4zB9p1hkaLWhKoXB8YwGHU&e=>
    >
    >    ========================================
    >
    >    Methods:
    >    I would like an experiment where relatively complete data is available
    >    in the lower symmetry. To get something that is available to everyone, 
I
    >    choose from the PDB. A good example is 2CHR, in space group I422, which
    >    was originally solved and the data deposited in I4 with two molecules 
in
    >    the asymmetric unit(structure 1CHR).
    >
    >    2CHR statistics from the PDB:
    >              R      R-free  complete  (Refined 8.0 to 3.0 A
    >              0.189  0.264  81.4      reported in PDB, with 2-Sig cutoff)
    >                                          Nfree=442  (4.86%)
    >    Further refinement in phenix with same free set, no sigma cutoff:
    >        10 macrocycles bss, indiv XYZ, indiv ADP refinement; phenix default
    >        Resol 37.12 - 3.00 A 92.95% complete, Nrefl=9087 Nfree=442(4.86%)
    >        Start: r_work = 0.2097 r_free = 0.2503 bonds = 0.008 angles = 1.428
    >        Final: r_work = 0.1787 r_free = 0.2403 bonds = 0.011 angles = 1.284
    >        (2chr_orig_001.pdb,
    >
    >    The number of free reflections is small, so the uncertainty
    >    in Rfree is large (a good case for Rcomplete)
    >    Instead for better statistics, use new 10% free set and repeat 10 
times;
    >    50 macrocycles, with different random seeds:
    >        R: 0.1725±0.0013  Rfree: 0.2507±0.0062 bonds:0.010 Angles:1.192
    >        Nfree: 908.9±0.32  Nrefl: 9087
    >
    >    For artificially low symmetry, expand the I422 structure (making what I
    >    call 3chr for convenience although I'm sure that ID has been taken):
    >
    >    pdbset xyzin 2CHR.pdb xyzout 3chr.pdb <<eof
    >    exclude header
    >    spacegroup I4
    >    cell 111.890  111.890  148.490  90.00  90.00  90.00
    >    symgen  X,Y,Z
    >    symgen X,1-Y,1-Z
    >    CHAIN SYMMETRY 2 A B
    >    eof
    >
    >    Get the structure factors from 1CHR: 1chr-sf.cif
    >    Run phenix.refine on 3chr.pdb with 1chr-sf.cif.
    >    This file has no free set (deposited 1993) so tell phenix to generate
    >    one. I don't want phenix to protect me from my own stupidity, so I use:
    >              generate = True
    >              use_lattice_symmetry = False
    >              use_dataman_shells = False
    >          (the .eff file with all non-default parameters is available as
    >    3chr_rand_001.eff in the .tgz mentioned above)
    >
    >    For more significance, use the script multirefine.csh to repeat the 
refinement 10 times with different random seed.After each run, grep significant 
results into a log file.
    >
    >
    >    To check this gives free reflections related to working reflections, I
    >    used mtz2various and a fortran prog (sortfree.f in .tgz) to separate 
the
    >    data (3chr_rand_data.mtz) into two asymmetric units: h,k,l with h>k
    >    (columns 4-5) and with h<k (col 6-7), listed the pairs, thusly:
    >
    >    mtz2various hklin 3chr_rand_data.mtz hklout temp.hkl <<eof
    >        LABIN FP=F-obs DUM1=R-free-flags
    >        OUTPUT USER '(3I4,2F10.5)'
    >    eof
    >    sortfree <<eof >sort3.hkl
    >
    >    sort3.hkl  looks like:
    >                        ______h>k______    ______h<k______
    >          h  k  l      F        free    F*        free*
    >          1  2  3    208.97      0.00    174.95      0.00
    >          1  2  5    226.85      0.00    191.65      0.00
    >          1  2  7    144.85      0.00    164.86      0.00
    >          1  2  9    251.26      0.00    261.71      0.00
    >          1  2  11    333.84      0.00    335.18      0.00
    >          1  2  13    800.37      0.00    791.77      0.00
    >          1  2  15    412.92      0.00    409.90      0.00
    >          1  2  17    306.99      0.00    317.53      0.00
    >          1  2  19    225.54      0.00    220.91      0.00
    >          1  2  21    101.20      1.00*  104.84      0.00
    >          1  2  23    156.27      0.00    156.49      0.00
    >          1  2  25    202.97      0.00    202.23      0.00
    >          1  2  27    216.10      0.00    219.28      0.00
    >          1  2  29    106.76      0.00    100.93      0.00
    >          1  2  31    157.32      0.00    154.37      1.00*
    >          1  2  33    71.84      0.00    20.78      0.00
    >          1  2  35    179.05      0.00    165.67      0.00
    >          1  2  37    254.04      0.00    239.96      1.00*
    >          1  2  39    69.56      0.00    30.61      0.00
    >          1  2  41    56.20      0.00    51.02      0.00
    >
    >    , and awked for 1 in the free columns. Out of 6922 pairs of 
reflections,
    >    in one case:
    >    674 in the first asu (h>k) are in the free set,
    >    703 in the second asu (h<k) are in the free set
    >    only 11 pairs have the reflections in both asu free.
    >
    >    out of 16011 refl in I4,
    >    6922 pairs (=13844 refl), 1049 invariant (h=k or h=0), 1118 with 
absent mate.
    >
    >    out of 1601 free reflections:
    >    On average 15 free reflections were in pairs, 212 were invariant under
    >    the operator (no sym-mate) and 1374 (86%) were paired with working
    >    reflections.
    >
    >    Then do 10 more runs of 50 macrocycles with:
    >          use_lattice_symmetry = False
    >          collecting the same statistics
    >    (also scripted in multirefine.csh)
    >
    >    Finally, use ref2chr.eff to refine (as previously mentined) a monomer 
in I422 (2chr.pdb) 10 times with 10% free, 50 macrocycles
    >    (also scripted in multirefine.csh)
    >
    >    
########################################################################
    >
    >    To unsubscribe from the CCP4BB list, click the following link:
    > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=><https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=>
    >
    >
    >
    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    >
    > To unsubscribe from the CCP4BB list, click the following link:
    > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=><https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=>

    >

    ########################################################################

    To unsubscribe from the CCP4BB list, click the following link:
    https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=>

    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    To unsubscribe from the CCP4BB list, click the following link:
    https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=>


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=>


------
Randy J. Read
Department of Haematology, University of Cambridge
Cambridge Institute for Medical Research     Tel: + 44 1223 336500
The Keith Peters Building                               Fax: + 44 1223 336827
Hills Road E-mail: rj...@cam.ac.uk <mailto:rj...@cam.ac.uk>
Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www-2Dstructmed.cimr.cam.ac.uk&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=QnGhZQ7OTtSqw_dwNpIlavZRl-5YJY7GKlV5Ho48zM4&e=>


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=>


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

Reply via email to