Re: [ccp4bb] Free Reflections as Percent and not a Number

Edward A. Berry Sun, 04 Jan 2015 15:23:32 -0800

I didn't think about that. Yes, weights were being re-optimized each time.


On 01/04/2015 04:39 PM, Ian Tickle wrote:

Hi Ed

The R, Rfree and RMSDs will all depend to some extent on the Wa factor and this 
may depend on the starting point, assuming of course that the program is 
automatically adjusting the Wa factor according to some criterion (you didn't 
say).  The obvious way to check this would be to keep the Wa factor fixed 
throughout.

I didn't observe this behaviour when I tested the 'jiggling' of the 
co-ordinates, though admittedly I haven't performed your exact experiment (I 
will).  I agree that an RMSD of 0.054 should be well within the radius of 
convergence at 1.8 A.

Cheers

-- Ian

On 4 January 2015 at 17:50, Edward A. Berry <ber...@upstate.edu 
<mailto:ber...@upstate.edu>> wrote:

    On 11/25/2014 01:41 PM, Tim Gruene wrote:

        Hi Ed,

        it is an easy excercise to show that theory (according to "by
        definition") and reality greatly diverge - refinement is too complex to
        get back to exactly the same structure. Maybe because one often does not
        reach convergence, no matter how  many cycles of refinement you run.


    Yes- i was able to convince myself of this.

    I took a structure which I considered was refined to convergencee.
    I didn't see any way to refine with no free set in phenix, even with
    least-squares target function, so I refined against a newly chosen free set,
    should test the same principle. Turned off HQN flips and real space 
refinement.
    After 2 rounds of 3 macrocycles each of individual ADP and XYZ refinement,
    comparing the structure with the original gave all-atom RMSD of 0.0540 A,
    maximum displacement 1.0161 A. Should be within radius of convergence, 
right?

    So then I returned to the original free set in order to let it refine back
    to the original position. R-free started out equal to R but soon increased 
to
    approximately the original value., 0.2011 0.2256 vs original 0.2036 0.2278
    (this is a 1.8A structure). So far so good.

    But looking at RMSD compared to original, the numbers never decreased,
    they continue to increase with every cycle. The structure is not returning 
to the
    original but finding equally good solutions in the neighborhood. I guess it 
is
    meandering about an essentially flat plateau (or better, flat-bottomed 
valley).
    Does not reach convergence, no matter how  many cycles of refinement you 
run.
    eab




        Best,
        Tim

        On 11/25/2014 07:29 PM, Edward A. Berry wrote:

                provided the jiggling keeps the structure inside the convergence
                radius of refinement, then by definition the refinement will 
produce
                the same result irrespective of the starting point (i.e. 
jiggled or
                not).  If the jiggling takes the structure outside the radius of
                convergence then the original structure will not be retrievable
                without manual rebuilding: I'm assuming that's not the goal 
here.



            I actually agree with this, but an R-free purist might argue that 
you
            have to get outside of radius of convergence to eliminate R-free 
bias.
            Otherwise, by definition, "you will just refine back to the same old
            biased structure!".
                (but you have shown that the conventional .2A rms is within 
radius of
            convergence)

            In fact Dale's concern about low-res reflections could be put in 
terms
            of radius of convergence and false minima.
            Moving a lot of atoms by .2 A will have a significant effect on the
            phase of a 2A reflection, but almost no effect on a 20A reflection. 
Say
            you have refined against all the low resolution reflections, and 
got a
            structure that fits better than it should because it is fitting the
            noise in the free reflections. Now take away the free reflections 
and
            continue to refine. It will drop into the nearest local minimum, 
which
            since it is near the solution with all reflections, will still give
            artificially low R-free.  Jiggling by 0.2 A will have no effect 
because
            the local minima are are extremely broad and shallow, as far as the
            low-res reflections go.

            But then you could say that since any local minima are so broad, all
            structures that are even slightly reasonable, (including the correct
            one) will be within radius of convergence of the same minimum as 
far as
            the low-res reflections are concerned. The nearest false minimum
            involves moving atoms by 5-10 A, so within reason the convergence 
point
            will be completely independent of the starting structure. Presumably
            this is why Phenix rigid body refinement starts out at ultra-low
            resolution: to increase the radius of convergence. From that
            perspective, rather than being the worrisome part, the 
low-resolution is
            the region where we can assume Ian's assumption is correct.

            What about another experiment, which I think we've discussed before.
            Take a structure refined to convergence with a pristine free set. 
Now
            refine to convergence against all the data. The purist will say 
that the
            free set is hopelessly corrupted. And sure enough when we take that
            structure and calculate free-R with the original set, R-free is 
same as
            R-work within statistical significance.  But- I guess adding the 
extra
            5% reflections will not change any atomic position by more than 0.2 
A
            (maybe 0.02A), and so we are still well within radius of 
convergence of
            the original unbiased structure. Refining against the original 
working
            set will give back that unbiased structure, and Rfree will return 
to it
            original value.

            This suggest, if the only purpose of Rfree is to get a number to 
deposit
            with the pdb (which it is not), you should first solve your 
structure
            using all the data, fitting the noise; then exclude a free set and 
back
            off on fitting the noise of it to get the R-free.  The only problem
            would be that during the refinement without guidance of R-free, you 
may
            have engaged in some practice that hurt the structure so much that 
it
            ends up out of RoC of the well-refined structure. Not because you 
were
            fitting the noise (anyway you are fitting the noise in your 95% 
working
            set) but because you would not have been warned that some procedure 
was
            not helping.

            Very provocative discussion!
            eab


            On 11/25/2014 11:03 AM, Ian Tickle wrote:

                Dear All

                I'd like to raise the question again of whether any of this 
'jiggling'
                (i.e. addition of random noise to the co-ordinates) is really
                necessary anyway, notwithstanding Dale's valid point that even 
if it
                were necessary, jiggling in its present incarnation is unlikely 
to
                work because it's unlikely to erase the influence of low res. 
reflexions.

                My claim is that jiggling is completely unnecessary, because I
                maintain that refinement to convergence is alI that is required 
to
                remove the bias when an alternate test set is selected.  In 
fact I
                claim that it's the refinement, not the jiggling, that's wholly
                responsible for removing the bias.  I know we thrashed this out 
a
                while back and I recall that the discussion ended with a 
challenge to
                me to prove my claim that the refine-only Rfrees are indeed 
unbiased.
                I couldn't see an easy way of doing this which didn't involve
                rebuilding and re-refining the same structure 20 times over, 
without
                introducing any observer bias.

                The present discussion prompted me to think again about this 
and I
                believe I can prove part of my claim quite easily, that 
jiggling has
                no effect on the results.  Proving that the resulting Rfrees are
                unbiased is much harder, since as we've seen there's no proof 
that
                jiggling actually removes the bias as claimed by its proponents.
                However given that said proponents of jiggling+refinement have 
been
                happy to accept for many years that their results are unbiased, 
then
                they must be equally happy now to accept that the 
refinement-only
                results are also unbiased, provided I can demonstrate that the
                difference between the results is insignificant.

                The experimental proof rests on comparison between the Rfrees 
and
                RMSDs of the jiggled+refined and the refined-only structures 
for the
                19 possible alternate test sets (assuming 5% test-set size).  If
                jiggling makes no difference as I claim then there should be no
                significant difference between the Rfrees and insignificant 
RMSDs for
                all pairs of alternate test sets.

                However, first we must be careful to establish what is a 
suitable
                value for the noise magnitude to add to the co-ordinates.  If 
it's too
                small it won't remove the bias (again notwithstanding Dale's 
point
                that it's unlikely to have any effect anyway on the low res. 
data);
                too large and you push it beyond the convergence radius of the
                refinement and end up damaging the structure irretrievably (at 
least
                unless you're prepared to do significant rebuilding of the 
model).

                For the record here's the crystal info for the test data I 
selected:

                Nres: 96   SG: P41212   Vm: 1.99   Solvent: 0.377
                Resol: 40-1.58 A.
                Working set size: 11563   Test set size: 611 (5%)   Test set: 0
                Refinement program:     BUSTER.
                Noise addition program: PDBSET.

                It's wise to choose a small protein since you need to run lots 
of
                refinements!  However feel free to try the same thing with your 
own data.

                First I took care that the starting model was refined to 
convergence
                using the original test set 0, and I performed 2 sequential 
runs of
                refinement with BUSTER (the deviations are relative to the input
                co-ordinates in each case):

                Ncyc  Rwork   Rfree   RMSD MaxDev
                     82     0.181  0.230     0.005   0.072
                     51     0.181  0.231     0.002   0.015

                The advantage of using BUSTER is that it has its own 
convergence test;
                with REFMAC you have to guess.

                Then I tried a range of input noise values (0.20, 0.25. 0.30, 
0.35,
                0.40, 0.50 A) on the refined starting model.  Note that these 
are
                RMSDs, not maximum shifts as claimed by the PDBSET 
documentation.  In
                each case I did 4 sequential runs of BUSTER on the jiggled
                co-ordinates and by looking at the RMSDs and max. shifts I 
decided
                that 0.25 A RMSD was all the structure could stand without 
risking
                permanent damage (note that the default noise value in PDBSET 
is 0.2):

                Initial RMSD: 0.248  MaxDev: 0.407

                Ncyc  Rwork   Rfree   RMSD  MaxDev
                    358    0.183   0.230    0.052    0.454
                    126    0.181   0.232    0.041    0.383
                      65    0.181   0.232    0.040    0.368
                      50    0.181   0.232    0.040    0.360

                The only purpose of the above refinements is to establish the 
most
                suitable noise value; the resulting refined PDB files were not 
used.

                So then I took the co-ordinates with 0.25 A noise added and for 
each
                test set 1-19 did 2 sequential runs of BUSTER.

                Finally I took the original refined starting model (i.e. 
without noise
                addition) and again refined to convergence using all 19 
alternate test
                sets.

                The results are attached.  The correlation coefficient between 
the 2
                sets of Rfrees is 0.992 and the mean RMSD between the sets is 
0.04 A,
                so the difference between the 2 sets is indeed insignificant.

                I don't find this result surprising at all: provided the 
jiggling
                keeps the structure inside the convergence radius of 
refinement, then
                by definition the refinement will produce the same result 
irrespective
                of the starting point (i.e. jiggled or not).  If the jiggling 
takes
                the structure outside the radius of convergence then the 
original
                structure will not be retrievable without manual rebuilding: I'm
                assuming that's not the goal here.

                I suspect that the idea of jiggling may have come about because
                refinements have not always been carried through to convergence:
                clearly if you don't do a proper job of refinement then you must
                expect some of the original bias to remain.  Also to head off 
the
                suggestion that simulated annealing refinement would fix this I 
would
                suggest that any kind of SA refinement is only of value for 
initial MR
                models when there may be significant systematic error in the 
model;
                it's not generally advisable to perform it on final refined 
models
                (jiggled or not) when there is no such systematic error present.

                Cheers

                -- Ian


                On 21 November 2014 18:56, Dale Tronrud <de...@daletronrud.com 
<mailto:de...@daletronrud.com>
                <mailto:de...@daletronrud.com 
<mailto:de...@daletronrud.com>>__> wrote:



            On 11/21/2014 12:35 AM, "F.Xavier Gomis-Rüth" wrote:
               > <snip...>

                As to the convenience of carrying over a test set to another
                dataset, Eleanor made a suggestion to circumvent this necessity
                some time ago: pass your coordinates through pdbset and add some
                noise before refinement:


                pdbset xyzin xx.pdb xyzout yy.pdb <<eof noise 0.4 eof



                  I've heard this "debiasing" procedure proposed before, but 
I've
            never seen a proper test showing that it works.  I'm concerned that
            this will not erase the influence of low resolution reflections that
            were in the old working set but are now in the new test set.  While
            adding 0.4 A gaussian noise to a model would cause large changes to
            the 2 A structure factors I doubt it would do much to those at 10 A.

                  It seems to me that one would have to have random, but

                    correlated,

            shifts in atomic parameters to affect the low resolution data - 
waves
            of displacements, sometimes to the left and other times to the 
right.
                You would need, of course, a superposition of such waves that 
span
            all the scales of resolution in the data set.

                  Has anyone looked at the pdbset jiggling results and shown

                    that the

            low resolution data are scrambled?

            Dale Tronrud

                Xavier


                On 20/11/14 11:43 PM, Keller, Jacob wrote:

                    Dear Crystallographers,


                    I thought that for reliable values for Rfree, one needs 
only to
                    satisfy counting statistics, and therefore using at most a 
couple
                    thousand reflections should always be sufficient. Almost 
always,
                    however, some seemingly-arbitrary percentage of reflections 
is
                    used, say 5%. Is there any rationale for using a percentage
                    rather than some absolute number like 1000?


                    All the best,


                    Jacob


                    ******************************__************* Jacob Pearson 
Keller,
                    PhD Looger Lab/HHMI Janelia Research Campus 19700 Helix Dr,
                    Ashburn, VA 20147 email:kell...@janelia.hhmi.org 
<mailto:email%3akell...@janelia.hhmi.org>
                    <mailto:kell...@janelia.hhmi.__org 
<mailto:kell...@janelia.hhmi.org>>
                    ******************************__************* .



                --

Re: [ccp4bb] Free Reflections as Percent and not a Number

Reply via email to