The images do seem available via the RCSB page: https://proteindiffraction.org/project/6nkq/
On Wednesday, 12 June 2019, 22:43:26 BST, Gerard Bricogne <g...@globalphasing.com> wrote: Dear Ian and James, This PDB entry, apart from having the peculiarity of having 6 molecules in the asymmetric unit but also 6 twin domains of about equal importance, has very anisotropic diffraction, and the deposited data have been absolutely massacred by the isotropic cut-off applied. See the attached picture, where the boundary between the orange and the yellow is at a local average <I/sig(I)> value of 8.17, and that between yellow and green at a value of 18.68. There has therefore been considerable loss of significant data as a result of the isotropic cut-off applied. Readers may check this for themselves in 3D by using the PDBpeep server at http://staraniso.globalphasing.org/cgi-bin/PDBpeep.cgi (just enter the code 6nkq in the box provided). If images could be made available, they could be given a better chance to produce all the diffraction data they actually contain. I haven't tried to work out how the declared twinning would interact with the NCS. With best wishes, Gerard. -- On Wed, Jun 12, 2019 at 10:03:04PM +0100, Ian Tickle wrote: > Hi James > > Thanks, will do. > > Cheers > > -- Ian > > > On Wed, 12 Jun 2019 at 22:02, Holton, James M <jmhol...@slac.stanford.edu> > wrote: > > > try 6nkq ? > > > > -James Holton > > MAD Scientist > > > > On 6/12/2019 11:46 AM, Ian Tickle wrote: > > > > > > Dear Jon & Randy > > > > I did a test of this using the 2FUQ data which is one of the problematic > > cases you mention where the NCS is nearly crystallographic (in this case an > > NCS 2-fold parallel to b in P212121): > > > > Transformation matrix: > > -0.99992 0.01204 0.00354 > > 0.01200 0.99989 -0.00918 > > -0.00365 -0.00914 -0.99995 > > > > Eulerian rotation: 291.08 179.44 291.77 > > Orthogonal translation: 72.125 0.021 100.886 > > > > For the refinement I used BUSTER with its automated similarity restraint > > (autoncs) feature. It makes no significant difference to the result > > whether I use FREERFLAG or SFTOOLS/RFREE/SHELL to create the Rfree flags. > > > > For FREERFLAG: > > > > Starting Rwork/Rfree = 0.3002 0.3008 > > Final Rwork/Rfree = 0.2012 0.2245 > > > > For SFTOOLS/RFREE/SHELL: > > > > Starting Rwork/Rfree = 0.3001 0.3014 > > Final Rwork/Rfree = 0.2012 0.2255 > > > > This was after jiggling the co-ordinates and setting all B factors to the > > average. In fact that's not necessary: to 3 d.p.s you get the same result > > just using the deposited co-ordinates & B factors: > > > > For FREERFLAG: > > > > Starting Rwork/Rfree = 0.2702 0.2674 > > Final Rwork/Rfree = 0.2007 0.2236 > > > > For SFTOOLS/RFREE/SHELL: > > > > Starting Rwork/Rfree = 0.2700 0.2707 > > Final Rwork/Rfree = 0.2007 0.2240 > > > > For this to work the refinement must be run until convergence, then it > > will simply refine to the same structure with no 'memory' of the starting > > structure: BUSTER seems to do a good job in this respect (it runs about 400 > > iterations). > > > > This is admittedly a single example: I haven't attempted the more > > extensive tests that Jon did mainly because I don't have more examples of > > cases where the NCS is nearly crystallographic and where if there is any > > effect it would be most likely to show up. > > > > Anyway my take on this from this one example is that neither NCS > > restraints nor Rfree flag selection nor jiggling makes any difference, even > > in that worst case scenario. I suspect it may be that Rfree is a global > > statistic that is just not sensitive enough to detect that. > > > > Cheers > > > > -- Ian > > > > > > > > > > On Wed, 5 Jun 2019 at 15:08, Randy Read <rj...@cam.ac.uk> wrote: > > > >> Dear Ian, > >> > >> I think the missing ingredient in your argument is an assumption that may > >> be implicit in what others have written: if you have NCS in your crystal, > >> you should be restraining that NCS in your model. If you do that, then the > >> NCS-related Fcalcs will be similar (especially in the particularly > >> problematic case where the NCS is nearly crystallographic), and if the > >> working reflections are over-fit to match the Fobs values, then the free > >> reflections that are related by the same NCS will also be overfit. So the > >> measurement errors don't have to be correlated, just the modelling errors. > >> > >> Best wishes, > >> > >> Randy > >> > >> On 5 Jun 2019, at 13:58, Ian Tickle <ianj...@gmail.com> wrote: > >> > >> > >> Hi Jon > >> > >> Sorry I didn't intend for my response to be interpreted as saying that > >> anyone has suggested directly that the measurement errors of NCS-related > >> reflection amplitudes are correlated. In fact the opposite is almost > >> certainly true since the only obvious way in practice that errors in Fobs > >> could be correlated is via errors in the batch scale factors which would > >> introduce correlations between errors in Fobs for reflections in the same > >> or adjacent images, but that has nothing to do with NCS. That's the > >> 'elephant in the room': no-one has suggested that reflections on the same > >> or adjacent images should not be split between the working and test sets, > >> yet that's easily the biggest contributor to CV bias with or without NCS! > >> I think taking that effect into account would be much more productive than > >> worrying about NCS, but performing the test-set sampling in shells can't > >> possibly address that, since the images obviously cut across all shells. > >> > >> The point I was making was that correlation of errors in NCS-related Fobs > >> would appear to be the inevitable _implication_ of what certainly has been > >> claimed, namely that NCS can introduce bias into CV statistics if the > >> test-set sampling is not done correctly, i.e. by splitting NCS-related Fobs > >> between the working and test sets. Unless there's something I've missed > >> that's > >> the only possible explanation for that claim. This is because overfitting > >> results from fitting the model to the errors in Fobs, and the CV bias > >> arises from correlation of those errors if the NCS-related Fobs are split > >> up, thus causing the degree of overfitting to be underestimated and giving > >> a too-rosy picture of the structure quality. Indeed you seem to be saying > >> that because the NCS-related Fobs are correlated (a patently true > >> statement), then it follows that the errors in those Fobs are also > >> correlated, or at least no more correlated than for non-NCS-related Fobs, > >> but I just don't see how that can be true. > >> > >> Rfree is not unbiased: as a measure of the agreement it is biased upwards > >> by overfitting (otherwise how could it be used to detect overfitting?), by > >> failing to fit with the uncorrelated errors in the test-set Fobs, just as > >> Rwork is biased downwards by fitting to the errors in the working-set > >> Fobs. Overfitting becomes immediately apparent whenever you perform any > >> refinement, so the only point at which there is no overfitting is for the > >> initial model when Rwork and Rfree are equal, apart from a small > >> difference arising from random sampling of the test-set (that sampling > >> error could be reduced by performing refinements with all 20 working/test > >> sets combinations and averaging the R values). From there on the 'gap' > >> between Rwork and Rfree is a measure of the degree of overfitting, so we > >> should really be taking some average of Rwork and Rfree as the true measure > >> of agreement (though the biases are not exactly equal and opposite so it's > >> not a simple arithmetic mean). The goal of choosing the appropriate > >> refinement parameters, restraints and weights is to _minimise_ overfitting, > >> not eliminate it. It is not possible to eliminate it completely: if it > >> were then Rwork and Rfree would become equal (apart from that small effect > >> from random sampling). > >> > >> I don't follow your argument about correlation of Fobs from NCS. > >> Overfitting, and therefore CV bias, arises from the _errors_ in the Fobs > >> not from the Fobs themselves, and there's no reason to believe that the > >> Fobs should be correlated with their errors. You say "any correlation > >> between the test-set and the working-set F's due to NCS would be expected > >> to reduce R-free". If the working and test sets are correlated by NCS that > >> would mean that Rwork is correlated with Rfree so they would be reduced > >> equally! There are two components of the Fobs - Fcalc difference: Fcalc - > >> Ftrue (the model error) and Fobs - Ftrue (the data error). The former is > >> completely correlated between the working and test sets (obviously since > >> it's the same model) so what you do to one you must do to the other. The > >> latter can only be correlated by NCS if NCS has an effect on errors in the > >> Fobs, which it doesn't, or by some other effect such as errors in batch > >> scales that are unrelated to NCS. > >> > >> Overfitting is related to the data/parameter ratio so you don't observe > >> the effects of overfitting until you change the model, the parameter set or > >> the restraints. If there were no errors there would be no overfitting and > >> no CV bias (actually there would be no need for cross-validation!). > >> > >> Of course as you say, your tests suggest that there is no CV bias from > >> NCS, in which case there's absolutely nothing to explain! > >> > >> Cheers > >> > >> -- Ian > >> > >> > >> On Tue, 4 Jun 2019 at 21:33, Jonathan Cooper < > >> 00000c2488af9525-dmarc-requ...@jiscmail.ac.uk> wrote: > >> > >>> Ian, statistics is not my forte, but I don't think anyone is suggesting > >>> that the measurement errors of NCS-related reflection amplitudes are > >>> correlated. In simple terms, since NCS-related F's should be correlated, > >>> the working-set reflection amplitudes could be correlated with those in > >>> the > >>> test-set, if the latter is chosen randomly, rather than in shells. Am I > >>> right in saying that R-free not just indicates over-fitting but, also, > >>> acts > >>> as an unbiased measure of the agreement between Fo and Fc? During a > >>> well-behaved refinement run, in the cycles before any over-fitting becomes > >>> apparent, the decrease in R-free value will indicate that the changes > >>> being > >>> made to the model are making it more consistent with Fo's. In these > >>> stages, > >>> any correlation between the test-set and the working-set F's due to NCS > >>> would be expected to affect the R-free (cross-validation bias), making it > >>> lower than it would be if the test set had been chosen in resolution > >>> shells? However, you are always right and, as you know, I failed to detect > >>> any such effect in my limited tests. Thanks to you and others for > >>> replying. > >>> > >>> > >>> On Tuesday, 4 June 2019, 02:07:10 BST, Edward A. Berry < > >>> ber...@upstate.edu> wrote: > >>> > >>> > >>> On 05/19/2019 08:21 AM, Ian Tickle wrote: > >>> ~~~ > >>> >> So there you have it: what matters is that the _errors_ in the > >>> NCS-related amplitudes are uncorrelated, or at least no more correlated > >>> than the errors in the non-NCS-related amplitudes, NOT the amplitudes > >>> themselves. > >>> > >>> Thanks, Ian! > >>> > >>> I would like to think that it is the errors in Fobs that matter (as may > >>> be the case), because then: > >>> 1. ncs would not bias R-free even if you _do_ use ncs > >>> constraints/restraints. (changes in Fcalc due to a step of refinement > >>> would > >>> be positively correlated between sym-mates, but if the sign of (Fo-Fc) is > >>> opposite at the sym-mate, what impoves the working reflection would worsen > >>> the free) > >>> 2. There would be no need to use the same free set when you refine the > >>> structure against a new dataset (as for ligand studies) since the random > >>> errors of measurement in Fobs in the two sets would be unrelated. > >>> > >>> However when I suggested that in a previous post, I was reminded that > >>> errors in Fobs account for only a small part of the difference (Fo-Fc). > >>> The > >>> remainder must be due to inability of our simple atomic models to > >>> represent > >>> the actual electron density, or its diffraction; and for a symmetric > >>> structure and a symmetric model, that difference is likely to be > >>> symmetric. Whether that difference represents "noise" that we want to > >>> avoid fitting is another question, but it is likely that (Fo-Fc) will be > >>> correlated with sym-mates. So I settled for convincing myself that the > >>> changes in Fc brought about by refinement would be uncorrelated, and thus > >>> the _changes_ in (Fo-Fc) at each step would be uncorrelated. > >>> > >>> Below are some of the ideas I come up with in trying to think about > >>> this, and about bias in general. (Not very well organized and not the best > >>> of prose, but if one is a glutton for punishment, or just wants to see how > >>> the mind of a madman works . . .) > >>> > >>> Warning- some of this is contrary to current consensus opinion and the > >>> conclusions may be, in the words of a popular autobuilding program, partly > >>> WRONG! In particular, the idea that coupling by the G-function does not > >>> bias R-free, but rather is the only reason that R-free works at all! > >>> - - - - - - - - - - > >>> > >>> The differences (Fo-Fc) can be divided between (1) errors in measurement > >>> of reflection intensities and (2)failure of the model to represent the > >>> true structure. The first can be considered "noise" and we would expect > >>> it to be random, with no correlation between symm mates. > >>> However most of the difference between Fc and Fobs is not due to random > >>> noise in the data, but to failures of our model to accurately represent > >>> the real thing. These differences are likely to be ncs-symmetric. > >>> Leaving aside the question of whether or not we want to fit this kind of > >>> "noise" (bringing the model closer to the real structure?), we conclude > >>> that (Fo-Fc) is likely to be correlated between ncs-mates. > >>> > >>> But for refinement against the working set to bias the contribution of > >>> sym-related free-set reflections to R-free would require that _changes_ > >>> in |Fo-Fc| from a step of refinement would be ncs-correlated. If on the > >>> contrary they are not correlated, i.e. if a change that decreases > >>> |Fo-Fc| for a working reflection is equally likely to decrease or > >>> increase |Fo-Fc| for its sym mate (which may be) in the free set, then > >>> it is hard to see how refinement against the working reflection would > >>> bias R-free. > >>> > >>> Under what conditins would |Fo-Fc| for symmetry related reflections be > >>> correlated? This would be the case if change in Fc correlates AND the > >>> sign of (Fo-Fc) correlates. Again, if the difference were only due to > >>> random error in Fobs, then the sign of Fo-Fc of a symmetry related > >>> reflection > >>> would be as likely to be the opposite as the same (as the original > >>> reflection) so even if changes in Fc are correlated, what improves the > >>> fit to the original reflection would be as likely to worsen the fit to > >>> its mate. But we concluded above that Fo-Fc is likely to be correlated > >>> by symmetry, since the shortcomings of our model are likely to be > >>> symmetric. So we ask if changes in Fc are correlated. > >>> > >>> So why should a structural change result in correlated changes of > >>> symm-related Fc's? > >>> The Fc is the amplitude of the best-fit sin wave (of the specified > >>> frequency) to the projection of the density of the crystal onto the > >>> specified scattering vector. The refinement program can increase Fcalc > >>> by moving an atom so that its projection on the scattering vector moves > >>> toward a peak of that sine wave, or decrease it by moving away from a > >>> peak. > >>> If the projection of an atom on the scattering vector moves toward a > >>> peak, the density becomes more peaked and the amplitude increases, if it > >>> moves toward a trough it tends to take density away from the peak or > >>> fill in the trough and the density becomes flatter. > >>> > >>> But the scattering vector of a sym-related reflection is at a different > >>> angle, anywhere from almost 0 to 90 degrees from its mate (actually to > >>> 180*, but then the Friedel mate is close to zero- Its a question of how > >>> parallel they are, irrespective of direction). The atom we are changing > >>> will fall at a different position along the rotated scattering vector, > >>> and its movement may be toward a peak or trough of the projected density > >>> on that scattering vector. > >>> > >>> If the two reflections are close in reciprocal space, their scattering > >>> vectors will be nearly colinear, the projection of density onto them > >>> will be similar, and the projection of the atom being moved onto them > >>> will come at a similar position in these projections. In that case > >>> moving density so that its projection on one scattering vector moves > >>> toward or away from a peak of its best-fit sine wave will have a similar > >>> effect for the adjacent reflection, and their changes will be correlated. > >>> > >>> But if the reflections are not close in reciprocal space, their > >>> scattering vectors are at different angles, the projection of the > >>> density on them looks quite different, and the projection of the atom > >>> being moved comes at a different position. In this case it is impossible > >>> to predict how changes in the two reflections' amplitudes due to > >>> movement of an atom will correlate without knowing the details of the > >>> density. > >>> > >>> For symmetry-related reflections, the projection of density of the > >>> rotated protomer on the scattering vector of the rotated reflection will > >>> be the same as the projection of the density of the original protomer on > >>> the original reflection (hence the correlation of Fc). (in case the > >>> symmetry is actually crystallographic, as in our case, then the > >>> projection of the entire crystal on the rotated scattering vector will > >>> be the same as its projection on the original reflection's scattering > >>> vector). But the change we are making is only in the original protomer, > >>> not in its symm mate, and so its projection will fall at a different > >>> point along the rotated scattering vector, so whether it moves density > >>> toward a peak or trough is somewhat random. > >>> > >>> If ncs is restrained or constrained, the changes will > >>> also follow ncs-symmetry and so changes in Fc would be expected to be > >>> symmetric. > >>> > >>> I have extensive experiments, again with the same 2CHR structure > >>> refining with I4 symmetry, showing that when you introduce a change in > >>> the structure by random shaking or molecular dynamics, the correlation > >>> between changes in Fc for "ncs" symmetry related atoms is close to zero, > >>> and occasionally negative. The slight positive average correlation may be > >>> attributed to sym-pairs that are close in reciprocal space (like 1,0,30 > >>> and -1,0,30 if there were a 2-fold along 0,0,l) so that they are coupled > >>> not by ncs but by the G-function. Granted changes due to shaking might > >>> not be the same as changes due to refinement, but these were shaken > >>> starting from the refined position, and I assume that if they were > >>> refined > >>> from this randomly shaken position they would go back to the original > >>> refined position, in which case the Fc changes due to refinement would > >>> be equally uncorrelated. > >>> > >>> ---------- > >>> > >>> Coupling between reflections by the G function- > >>> Without saying exactly what is meant by couplings, reflections can be > >>> coupled in two ways. One, reflections are coupled to other reflections > >>> near > >>> them in reciprocal space. This is due to the fact that the molecular > >>> transform of the molecule is relatively smooth (due to the molecular > >>> transform being oversampled due to the asymmetric unit being larger than > >>> the structure contained?), so values of amplitude and > >>> phase for a reflection cannot differ too widely from those of neighboring > >>> reflections. Or because the scattering vectors of neighboring > >>> reflections are nearly parallel and similar in frequency so the projection > >>> of the density on them integrates similarly. > >>> (second is ncs-coupling) > >>> > >>> In general coupling of neighboring reflns is a good thing for > >>> crystallography. No one reflection is indispensable, because its > >>> information is much the same as the other reflections in a cube of 26 > >>> surrounding reflections. This allows us to solve structures when the data > >>> is only 80-90% complete, provided the missing reflections are randomly > >>> scattered among the present reflections. It supports the "fill-in" fft map > >>> procedure where FcΦc is used for missing reflections (the structure based > >>> on surrounding reflectins will be good enough to give a good estimate of > >>> the missing structure factor). It makes possible resolution extension > >>> during density modification or by the "free lunch" procedures of Dodson > >>> and > >>> Sheldrick . > >>> > >>> And I would argue that this coupling is what makes cross-validation > >>> (free-R) work. We say > >>> that refining against the working reflections improves the structure, > >>> making it more like the true structure, and thus the free Fc approach > >>> their > >>> Fobs. But not because the good fairy looks at the structure and says "OK, > >>> Its improved now, we can lower the R-free". > >>> How does it work mathematically? If the reflections were completely > >>> independent, if free and working reflections were not coupled through > >>> being > >>> samples of the same molecular transform, then changes which improve the > >>> fit > >>> to the working reflections would have no effect on the values of the free > >>> reflections. It has to go through the structure, changes due to refining > >>> against the working reflections affect the free reflections, which we can > >>> call "coupling", and we know that is described by the G-function. If free > >>> reflections were not coupled to working reflections, Rfree would never > >>> change and thus would be useless. > >>> > >>> For an example, suppose we refine the position of an atom, choosing > >>> working reflections only in the plane l=0, and free reflections along the > >>> l > >>> axis (assuming an orthorhombic system). The working reflections are only > >>> sensitive to position in the x and y directions, so the z position would > >>> be > >>> unchanged by the refinement. But the free reflections are only sensitive > >>> to > >>> position along the z axis, so R-free would be unchanged. Presumably the > >>> structure would be improved (if that one atom was slightly misplaced and > >>> all other atoms correctly placed), but the Rfee would not improve. I would > >>> say this is the direction Chapman and co. were heading with their thin > >>> shells of free reflections isolated by thick shells of unused guard > >>> reflections. If they really succeed in eliminating the "bias", then Rfree > >>> will be unresponsive to refinement and so useless. > >>> > >>> Al. et Chapman considered two kinds of coupling- that due to ncs and > >>> direct coupling via Rossmann's G function. They found that choosing free > >>> set > >>> in thin shells had little effect, in fact very thick shells with the > >>> test reflections centered in the middle of the shell were required to > >>> significantly reduce the "bias". Now the reciprocal space equivalent of > >>> ncs operators are pure rotational operators, so they relate points in > >>> reciprocal space with precisely the same resolution. Selecting free > >>> reflections in thin shells should thus be sufficient to ensure that > >>> ncs-related reflections have the same free-R flag and avoid bias. For > >>> my case where ncs is really crystallographic, the shells could be > >>> infinitely thin since the symm-related reflections have precisely the > >>> same resolution. For real ncs the operator takes a reflection to a > >>> non-bragg position which is closely surrounded by reflections, coupled > >>> to them by the G function. > >>> In that case somewhat thicker shells would be required. But using very > >>> thick guard zones around the free reflections implies it is the > >>> G-function they are fighting, as they somewhat implicitly acknowledged > >>> by the > >>> discussion of thickness of shells in terms of the radius of the central > >>> maximum > >>> of the G function. In that case I wonder if ncs-coupling which still has > >>> to go through G-function coupling to bias a free reflection > >>> contributes significantly compared to the coupling of every reflection to > >>> its direct neighbors. > >>> > >>> By using thick guard zones of unused reflections, they end up refining > >>> with very incomplete data which would be expected to affect the refinement > >>> and raise the R-free just because the structure is less correct. They > >>> control for this by refining with another set in which the same number of > >>> reflections are deleted randomly. But this is not a satisfactory control, > >>> because it is generally agreed that missing reflections due to an empty > >>> zone in reciprocal space is more deleterious than missing reflections that > >>> are randomly scattered. > >>> Ironically this same "redundancy due to oversampling" that Chapman and > >>> co. discuss in their introduction allows neighboring reflections to impart > >>> most of the information of an isolated absent reflection. When the missing > >>> reflections are clustered together in a thick shell or wedge, a lot of > >>> information is not available and the structure will suffer. And in > >>> particular the structural details that determine structure factors in the > >>> center of the excluded zone will be poorly determined, since information > >>> pertaining to them is being excluded. So of course the R-factor calculated > >>> from these reflections will be higher than with randomly absent data. > >>> Furthermore, if G-function is the vehicle by which R-free follows R, > >>> R-free > >>> will follow less closely and hence under-report what improvement is being > >>> made. > >>> > >>> > >>> > >>> > >>> > >>> > >>> > > >>> > On Sun, 19 May 2019 at 04:34, Edward A. Berry <ber...@upstate.edu > >>> <mailto:ber...@upstate.edu>> wrote: > >>> > > >>> > Revisiting (and testing) an old question: > >>> > > >>> > On 08/12/2003 02:38 PM, wgsc...@chemistry.ucsc.edu <mailto: > >>> wgsc...@chemistry.ucsc.edu> wrote: > >>> > > *** For details on how to be removed from this list visit the > >>> *** > >>> > > *** CCP4 home page http://www.ccp4.ac.uk < > >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ccp4.ac.uk&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=8QKUnHluH3BoqVGBCJIBrwzvKcMXJj0FA7ubqWWpqYo&e=> > >>> *** > >>> > > >>> > > On 08/12/2003 06:43 AM, Dirk Kostrewa wrote: > >>> > >> > >>> > >> (1) you only need to take special care for choosing a test set > >>> if you _apply_ > >>> > >> the NCS in your refinement, either as restraints or as > >>> constraints. If you > >>> > >> refine your NCS protomers without any NCS > >>> restraints/constraints, both your > >>> > >> protomers and your reflections will be independent, and thus > >>> no special care > >>> > >> for choosing a test set has to be taken > >>> > > > >>> > > If your space group is P6 with only one molecule in the > >>> asymmetric unit but you instead choose the subgroup P3 in which to refine > >>> it, and you now have two molecules per asymmetric unit related by "local" > >>> symmetry to one another, but you don't apply it, does that mean that > >>> reflections that are the same (by symmetry) in P6 are uncorrelated in P3 > >>> unless you apply the "NCS"? > >>> > > >>> > =================================================== > >>> > The experiment described below seems to show that Dirk's initial > >>> > statement was correct: even in the case where the "ncs" is actually > >>> > crystallographic, and the free set is chosen randomly, R-free is not > >>> > affected by how you pick the free set. A structure is refined with > >>> > artificially low symmetry, so that a 2-fold crystallographic > >>> operator > >>> > becomes "NCS". Free reflections are picked either randomly (in which > >>> > case the great majority of free reflections are related by the NCS > >>> to > >>> > working reflections), or taking the lattice symmetry into account so > >>> > that symm-related pairs are either both free or both working. The > >>> final > >>> > R-factors are not significantly different, even with repeating each > >>> mode > >>> > 10 times with independently selected free sets. They are also not > >>> > significantly different from the values obtained refining in the > >>> correct > >>> > space group, where there is no ncs. > >>> > > >>> > Maybe this is not really surprising. Since symmetry-related > >>> reflections > >>> > have the same resolution, picking free reflections this way is one > >>> way > >>> > of picking them in (very) thin shells, and this has been reported > >>> not to > >>> > avoid bias: See Table 2 of Kleywegt and Brunger Structure 1996, Vol > >>> 4, > >>> > 897-904. Also results of Chapman et al.(Acta Cryst. D62, 227–238). > >>> And see: > >>> > > >>> >http://www.phenix-online.org/pipermail/phenixbb/2012-January/018259.html > >>> < > >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.phenix-2Donline.org_pipermail_phenixbb_2012-2DJanuary_018259.html&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=9oRDhpFat0zQ7aXSW2pTyPmPQdn9Bq0AZ0KorlSXsVI&e= > >>> > > >>> > > >>> > But this is more significant: in cases of lattice symmetry like > >>> this, > >>> > the ncs takes working reflections directly onto free reflections. > >>> In the > >>> > case of true ncs the operator takes the reflection to a point > >>> between > >>> > neighboring reflections, which are closely coupled to that point by > >>> the > >>> > Rossmann G function. Some of these neighbors are outside the thin > >>> shell > >>> > (if the original reflection was inside; or vice versa), and thus > >>> defeat > >>> > the thin-shells strategy. In our case the symm-related free > >>> reflection > >>> > is directly coupled to the working reflection by the ncs operator, > >>> and > >>> > its neighbors are no closer than the neighbors of the original > >>> > reflection, so if there is bias due to NCS it should be principally > >>> > through the sym-related reflection and not through its neighbors. > >>> And so > >>> > most of the bias should be eliminated by picking the free set in > >>> thin > >>> > shells or by lattice symmetry. > >>> > > >>> > Also, since the "ncs" is really crystallographic, we have the > >>> control of > >>> > refining in the correct space group where there is no ncs. The > >>> R-factors > >>> > were not significantly different when the structure was refined in > >>> the > >>> > correct space group. (Although it could be argued that that leads > >>> to a > >>> > better structure, and the only reason the R-factors were the same is > >>> > that bias in the lower symmetry refinement resulted in lowering > >>> Rfree > >>> > to the same level.) > >>> > > >>> > Just one example, but it is the first I tried- no cherry-picking. I > >>> > would be interested to know if anyone has an example where taking > >>> > lattice symmetry into account did make a difference. > >>> > > >>> > For me the lack of effect is most simply explained by saying that, > >>> while > >>> > of course ncs-related reflections are correlated in their Fo's and > >>> Fc's, > >>> > and perhaps in in their |Fo-Fc|'s, I see no reason to expect that > >>> the > >>> > _changes_ in |Fo-Fc| produced by a step of refinement will be > >>> correlated > >>> > (I can expound on this). Therefore whatever refinement is doing to > >>> > improve the fit to working reflections is equally likely to improve > >>> or > >>> > worsen the fit to sym-related free reflections. In that case it is > >>> hard > >>> > to see how refinement against working reflections could bias their > >>> > symm-related free reflections. (Then how does R-free work? Why does > >>> > R-free come down at all when you refine? Because of coupling to > >>> > neighboring working reflections by the G-function?) > >>> > > >>> > Summary of results (details below): > >>> > 0. structure 2CHR, I422, as reported in PDB, with 2-Sigma cutoff) > >>> > R: 0.189 Rfree: 0.264 Nfree:442(5%) Nrefl: 9087 > >>> > > >>> > 1. The deposited 2chr (I422) was refined in that space group with > >>> the > >>> > original free set. No Sigma cutoff, 10 macrocycles. > >>> > R: 0.1767 Rfree: 0.2403 Nfree:442(5%) Nrefl: 9087 > >>> > > >>> > 2. The deposited structure was refined in I422 10 times, 50 > >>> macrocycles > >>> > each, with randomly picked 10% free reflections > >>> > R: 0.1725±0.0013 Rfree: 0.2507±0.0062 Nfree: 908.9± Nrefl: > >>> 9087 > >>> > > >>> > 3. The structure was expanded to an I4 dimer related by the unused > >>> I422 > >>> > crystallographic operator, matching the dimer of 1chr. This dimer > >>> was > >>> > refined against the original (I4) data of 1chr, picking free > >>> reflections > >>> > in symmetry related pairs. This was repeated 10 times with different > >>> > random seed for picking reflections. > >>> > R: 0.1666±0.0012 **Rfree:0.2523±0.0077 Nfree: 1601.4 Nrefl:16011 > >>> > > >>> > 4. same as 3 but picking free reflections randomly without regard > >>> for > >>> > lattice symmetry. > >>> > On average 15 free reflections were in pairs, 212 were invariant > >>> under > >>> > the operator (no sym-mate) and 1374 (86%) were paired with working > >>> > reflections. > >>> > R: 0.1674±0.0017 **Rfree:0.2523±0.0050 Nfree: 1600.9 Nrefl:16011 > >>> > > >>> > (**-Average Rfree almost identical by coincidence- the individual > >>> > results were all different) > >>> > > >>> > Detailed results from the individual refinement runs are available > >>> in > >>> > spreadsheet in dropbox: > >>> > https://www.dropbox.com/s/fwk6q90xbc5r8n1/NCSbias.xls?dl=0 < > >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_fwk6q90xbc5r8n1_NCSbias.xls-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=xjmRlh84Tgcz_o3E3OzRlzo5uEaF92jfvm39eskwksQ&e= > >>> > > >>> > Scripts used in running the tests are also there in NCSbias.tgz: > >>> > https://www.dropbox.com/s/sul7a6hzd5krppw/NCSbias.tgz?dl=0 < > >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_sul7a6hzd5krppw_NCSbias.tgz-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=rTs7C-Kah1oWzzdHbYI8K4zB9p1hkaLWhKoXB8YwGHU&e= > >>> > > >>> > > >>> > ======================================== > >>> > > >>> > Methods: > >>> > I would like an experiment where relatively complete data is > >>> available > >>> > in the lower symmetry. To get something that is available to > >>> everyone, I > >>> > choose from the PDB. A good example is 2CHR, in space group I422, > >>> which > >>> > was originally solved and the data deposited in I4 with two > >>> molecules in > >>> > the asymmetric unit(structure 1CHR). > >>> > > >>> > 2CHR statistics from the PDB: > >>> > R R-free complete (Refined 8.0 to 3.0 A > >>> > 0.189 0.264 81.4 reported in PDB, with 2-Sig > >>> cutoff) > >>> > Nfree=442 (4.86%) > >>> > Further refinement in phenix with same free set, no sigma cutoff: > >>> > 10 macrocycles bss, indiv XYZ, indiv ADP refinement; phenix > >>> default > >>> > Resol 37.12 - 3.00 A 92.95% complete, Nrefl=9087 > >>> Nfree=442(4.86%) > >>> > Start: r_work = 0.2097 r_free = 0.2503 bonds = 0.008 angles = > >>> 1.428 > >>> > Final: r_work = 0.1787 r_free = 0.2403 bonds = 0.011 angles = > >>> 1.284 > >>> > (2chr_orig_001.pdb, > >>> > > >>> > The number of free reflections is small, so the uncertainty > >>> > in Rfree is large (a good case for Rcomplete) > >>> > Instead for better statistics, use new 10% free set and repeat 10 > >>> times; > >>> > 50 macrocycles, with different random seeds: > >>> > R: 0.1725±0.0013 Rfree: 0.2507±0.0062 bonds:0.010 Angles:1.192 > >>> > Nfree: 908.9±0.32 Nrefl: 9087 > >>> > > >>> > For artificially low symmetry, expand the I422 structure (making > >>> what I > >>> > call 3chr for convenience although I'm sure that ID has been taken): > >>> > > >>> > pdbset xyzin 2CHR.pdb xyzout 3chr.pdb <<eof > >>> > exclude header > >>> > spacegroup I4 > >>> > cell 111.890 111.890 148.490 90.00 90.00 90.00 > >>> > symgen X,Y,Z > >>> > symgen X,1-Y,1-Z > >>> > CHAIN SYMMETRY 2 A B > >>> > eof > >>> > > >>> > Get the structure factors from 1CHR: 1chr-sf.cif > >>> > Run phenix.refine on 3chr.pdb with 1chr-sf.cif. > >>> > This file has no free set (deposited 1993) so tell phenix to > >>> generate > >>> > one. I don't want phenix to protect me from my own stupidity, so I > >>> use: > >>> > generate = True > >>> > use_lattice_symmetry = False > >>> > use_dataman_shells = False > >>> > (the .eff file with all non-default parameters is available as > >>> > 3chr_rand_001.eff in the .tgz mentioned above) > >>> > > >>> > For more significance, use the script multirefine.csh to repeat the > >>> refinement 10 times with different random seed.After each run, grep > >>> significant results into a log file. > >>> > > >>> > > >>> > To check this gives free reflections related to working > >>> reflections, I > >>> > used mtz2various and a fortran prog (sortfree.f in .tgz) to > >>> separate the > >>> > data (3chr_rand_data.mtz) into two asymmetric units: h,k,l with h>k > >>> > (columns 4-5) and with h<k (col 6-7), listed the pairs, thusly: > >>> > > >>> > mtz2various hklin 3chr_rand_data.mtz hklout temp.hkl <<eof > >>> > LABIN FP=F-obs DUM1=R-free-flags > >>> > OUTPUT USER '(3I4,2F10.5)' > >>> > eof > >>> > sortfree <<eof >sort3.hkl > >>> > > >>> > sort3.hkl looks like: > >>> > ______h>k______ ______h<k______ > >>> > h k l F free F* free* > >>> > 1 2 3 208.97 0.00 174.95 0.00 > >>> > 1 2 5 226.85 0.00 191.65 0.00 > >>> > 1 2 7 144.85 0.00 164.86 0.00 > >>> > 1 2 9 251.26 0.00 261.71 0.00 > >>> > 1 2 11 333.84 0.00 335.18 0.00 > >>> > 1 2 13 800.37 0.00 791.77 0.00 > >>> > 1 2 15 412.92 0.00 409.90 0.00 > >>> > 1 2 17 306.99 0.00 317.53 0.00 > >>> > 1 2 19 225.54 0.00 220.91 0.00 > >>> > 1 2 21 101.20 1.00* 104.84 0.00 > >>> > 1 2 23 156.27 0.00 156.49 0.00 > >>> > 1 2 25 202.97 0.00 202.23 0.00 > >>> > 1 2 27 216.10 0.00 219.28 0.00 > >>> > 1 2 29 106.76 0.00 100.93 0.00 > >>> > 1 2 31 157.32 0.00 154.37 1.00* > >>> > 1 2 33 71.84 0.00 20.78 0.00 > >>> > 1 2 35 179.05 0.00 165.67 0.00 > >>> > 1 2 37 254.04 0.00 239.96 1.00* > >>> > 1 2 39 69.56 0.00 30.61 0.00 > >>> > 1 2 41 56.20 0.00 51.02 0.00 > >>> > > >>> > , and awked for 1 in the free columns. Out of 6922 pairs of > >>> reflections, > >>> > in one case: > >>> > 674 in the first asu (h>k) are in the free set, > >>> > 703 in the second asu (h<k) are in the free set > >>> > only 11 pairs have the reflections in both asu free. > >>> > > >>> > out of 16011 refl in I4, > >>> > 6922 pairs (=13844 refl), 1049 invariant (h=k or h=0), 1118 with > >>> absent mate. > >>> > > >>> > out of 1601 free reflections: > >>> > On average 15 free reflections were in pairs, 212 were invariant > >>> under > >>> > the operator (no sym-mate) and 1374 (86%) were paired with working > >>> > reflections. > >>> > > >>> > Then do 10 more runs of 50 macrocycles with: > >>> > use_lattice_symmetry = False > >>> > collecting the same statistics > >>> > (also scripted in multirefine.csh) > >>> > > >>> > Finally, use ref2chr.eff to refine (as previously mentined) a > >>> monomer in I422 (2chr.pdb) 10 times with 10% free, 50 macrocycles > >>> > (also scripted in multirefine.csh) > >>> > > >>> > > >>> ######################################################################## > >>> > > >>> > To unsubscribe from the CCP4BB list, click the following link: > >>> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 < > >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e= > >>> > > >>> > > >>> > > >>> > > >>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > >>> > > >>> > To unsubscribe from the CCP4BB list, click the following link: > >>> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 < > >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=> > >>> > >>> > >>> > > >>> > >>> ######################################################################## > >>> > >>> To unsubscribe from the CCP4BB list, click the following link: > >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 > >>> > >>> ------------------------------ > >>> > >>> To unsubscribe from the CCP4BB list, click the following link: > >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 > >>> > >> > >> ------------------------------ > >> > >> To unsubscribe from the CCP4BB list, click the following link: > >> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 > >> > >> > >> ------ > >> Randy J. Read > >> Department of Haematology, University of Cambridge > >> Cambridge Institute for Medical Research Tel: + 44 1223 336500 > >> The Keith Peters Building Fax: + 44 1223 > >> 336827 > >> Hills Road E-mail: > >> rj...@cam.ac.uk <rj...@cam.ac.uk> > >> Cambridge CB2 0XY, U.K. > >> www-structmed.cimr.cam.ac.uk > >> > >> > > ------------------------------ > > > > To unsubscribe from the CCP4BB list, click the following link: > > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 > > > > > > > > ######################################################################## > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 -- =============================================================== * * * Gerard Bricogne g...@globalphasing.com * * * * Global Phasing Ltd. * * Sheraton House, Castle Park Tel: +44-(0)1223-353033 * * Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 * * * =============================================================== ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1 ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1