Dear Kay, Thank you for your reply to our message.
Your "few remarks" actually raise a considerable number of interrelated matters that need to be considered in their totality rather than piecemeal, so this reply will necessarily be rather lengthy. It was in any case our purpose in creating this separate thread to bring all these matters under simultaneous scrutiny, as we felt previous posts were alluding to subsets of them without really dealing fully with their interrelatedness. The double interleaving of our respective contributions to this thread may be inelegant, but hopefully will not be confusing. On Thu, Oct 06, 2022 at 11:32:35AM +0100, Kay Diederichs wrote: > Dear Gerard, > > I'm not going to comment on what others said in this (new) thread; just > trying to make a few remarks about what you write below - > > On Tue, 4 Oct 2022 17:01:10 +0100, Gerard Bricogne <g...@globalphasing.com> > wrote: > > >Dear all, > > > > First of all, apologies for breaking the threads entitled "PAIREF - > >Warning - not enough free reflections in resolution bin" and "Anisotropy" by > >merging them into a new one, but it somehow felt rather against nature to > >keep them separate. > > > > Since the early days of the availability of STARANISO [1] (the actual > >starting year for the Web server [2] was 2016), we had a hunch that much of > >what was happening in the PAIREF procedure might simply be the detection of > >the existence of significant data beyond an initially chosen resolution > >cut-off not only as a result of an excessively conservative criterion having > >been applied in that initial choice, but as a consequence of anisotropy in > >the data. > Why "much of what was happening ... as a consequence of anisotropy"? These > words imply that datasets where PAIREF indicates "existence of significant > data beyond an initially chosen resolution cut-off" (EOSDBAICRC) are > anisotropic, but that is a) not the case, because PAIREF - or paired > refinement in general - in my experience, and that of others, often > indicates EOSDBAICRC also for isotropic data, b) this depends on the > initial cutoff. So your general statement (or hunch?) cannot be correct. We do not see such an implication, but rather the reverse: these words state that when data are anisotropic (according to the context established by the Subject line of this thread), PAIREF is likely to suggest a higher resolution cut-off than would typically have been chosen initially, for the simple reason given in our paragraph immediately below - i.e the idea that such an initial choice would have been a "compromise", or intermediate, isotropic cut-off between the lowest and highest diffraction limits. Perhaps that paragraph is worth re-reading : > > The latter would give rise to different diffraction limits in > >different directions, and the choice of a single value for "the resolution" > >at which the data were cut off would necessarily yield a compromise value > >between the best and the worse diffraction limits. This would imply that > >significant data would be excluded in the best diffracting directions, that > >would subsequently drive PAIREF towards increasing the estimated resolution > >compared to its compromise value. This viewpoint was self-evident to us at the time we were starting the work on STARANISO because we constantly had in mind the fundamental picture https://staraniso.globalphasing.org/anisocut.png (referred to as "The Picture" in the rest of this message) that ended up being incorporated into the documentation on the server at https://staraniso.globalphasing.org/anisotropy_about.html (it would be very useful to keep that picture displayed at all times while reading this message). Regarding your point (a), therefore: we did write that this "hunch" referred to a situation where the data would be anisotropic - this is what "much of" was meant to subsume. Regarding point (b), the picture and the "compromise" nature of typical cut-off choices reconcile your statement with ours. For isotropic data we would not dispute that PAIREF might be able to detect that an initial cut-off might have been too conservative. > In its current implementation, PAIREF tries to determine the isotropic > resolution cutoff that gives the best model based on valid comparisons of > (mainly) Rfree values (it also gives other information to the user). To keep this discussion as focussed as possible, we thought we would try and stick to the rules of the BBC Radio4 talk show "Just a minute" and to write about our subject "without hesitation, deviation or repetition". This is notoriously hard, and the choice of Rfree as a decision criterion seems like a potential diversion - so we will leave this for another time. However the impact of anisotropy on the behaviour of that criterion needs to be considered, which is best done by reference to The Picture mentioned earlier: using a single cut-off for all categories of data, many reflections in the free set would land in the red region where the measurements are essentially noise. Their contributions to Rfree would then be irresponsive to improvements in the model, as the latter would not predict noise better, and the proportion of unresponsive Rfree contributors would increase as the isotropic resolution cut-off increases. > This is the correct thing to do for isotropic data, and still useful for > moderately anisotropic data, but clearly there is room for improvement, > e.g. by using an anisotropic high-resolution cutoff, or by using data from > STARANISO, or ... > We (the authors of the PAIREF paper) have been discussing the treatment of > anisotropy in the past, but we were under the impression that there is not > an obvious single best way to deal with anisotropy. We (the authors of this reply) have been considering this problem for many years now and have produced STARANISO, that is used both in a server (just about to reach 20,000 successful submissions since January 2016) and in autoPROC. We became convinced very early that the key to a satisfactory solution to the problem you are discussing resided in breaking away from the limitations of isotropic thinking and of the dilemma it created between the inclusion of significant vs. pure-noise data (again, see The Picture) when having only an isotropic cut-off, i.e. the radius of a sphere, to play with. This is the whole rationale for using an *anisotropic* cut-off surface instead of a sphere, something that was originally proposed by Michael Sawaya in Strong et al. in 2006 https://www.pnas.org/doi/full/10.1073/pnas.0602606103 and was implemented in the UCLA Diffraction Anisotropy Server: https://srv.mbi.ucla.edu/Anisoscale/discussion . What criterion, and what threshold value for that criterion, to use for this purpose, is something we have put a lot of work into and where we differ substantially from the UCLA Server implementation, but the essential idea is the same. > > This "hunch" was validated by a detailed comparison carried out on the > >exact same examples that are considered in the 2020 paper by Maly et al., > >that is summarised in the attached PDF. In other words, whenever anisotropy > >is present in the data, PAIREF will tend to indicate a higher value for an > >isotropic cut-off than would have been estimated for the initial dataset. > > based on what?? Different people employ different initial resolution > cut-offs, based on their prior experience. > Your general statement above assumes a certain decision mode that I'd say > is not universally valid. The statement in the second sentence of our paragraph immediately above your question, and our answer to your question itself, should preferably be read while viewing The Picture. You list different habits or lessons from experience that people might bring to bear on the initial choice of an isotropic resolution cut-off that will subsequently be reconsidered by PAIREF. One thing is certain, however: when the data are anisotropic whereas everyone only has in mind the choice of an isotropic cut-off, few people would be so conservative as to choose the lowest diffraction limit (as, e.g., CC_1/2 would be too high) or so audacious as to choose the highest one (CC_1/2 would be too low, or some other statistics would have alarming values). An initial resolution cut-off for an anisotropic dataset will therefore almost always lie between the extreme diffraction limits, so that PAIREF would be very likely to detect - how do you call it - EOSDBAICRC . This seems almost as obvious as the definition of anisotropy itself. More of the same viewpoint is formulated in our paragraph immediately below. > >The problem with taking the PAIREF result as the final answer is that the > >higher cut-off it indicates is applied *isotropically*. The inclusion of the > >significant data thus reclaimed is therefore unavoidably accompanied by that > >of noisy data in the worst diffracting direction(s), resulting in alarmingly > >poor statistics in the outermost shell (as pointed out in Eleanor's message) > >that may cast doubts on the usefulness of the procedure. > To my understanding, Eleanor's message was not about PAIREF, but you cite > it as if it were. I don't like this. We were referring to Eleanor's earlier reply to Matt McLeod's message that initiated the PAIREF-related thread in which she suggested a resolution cut-off on the basis of the statistics included by Matt in his message. Already there, she made a statement of belief that very weak data cannot harm refinement because they tend to be properly (down)weighted. [By the way, down-weighting according to Rfree would seem to raise questions about whether the free set is really free, but that would be a "deviation" according to the "Just a Minute" rules]. We cite Eleanor's message in relation to that belief because it was at hand, but it is widespread and has even morphed in some people's mind into a belief in the rather paradoxical notion of "informative very weak data". How does this relate to PAIREF? We would argue that, being a procedure that comes up with a resolution cut-off for a dataset through a criterion that does not seem to be internal to that dataset alone has left room for confusion about what is the intrinsic characteristic of that dataset that is driving PAIREF towards the cut-off it proposes. We have witnessed a specific confusion in some of our users in the form of an apprehension towards the idea of applying the STARANISO cut-off because, according to their previous experience based on isotropic cut-offs only, they had come to the view that they needed all the weak data to see their ligand. In due course STARANISO was run, and the ligand density, far from disappearing, became stronger and cleaner thanks to the inclusion of more data in the best-diffracting direction(s) *and* to the exclusion of the noisy measurements in the worst- diffracting ones. Referring to The Picture: enlarging the green region without simultaneously enlarging the red one. A lengthy and belaboured answer, perhaps, but this is why the matter of various, sometimes mistaken beliefs regarding very weak data seemed to us to be relevant to a discussion of PAIREF in an anisotropic context. Yet another recasting of our comments on The Picture is found in the first half of the paragraph immediately below. > > This consideration > >was the basis of the rationale for implementing an *anisotropic* cut-off > >surface in STARANISO, so that one could thus reclaim the significant data in > >the best-diffracting direction(s) while avoiding the simultaneous inclusion > >of the pure-noise measurements in the worse one(s). While this is clearly > >and extensively explained in the documentation provided on the STARANISO > >server [2], it seems to be far from having been assimilated. Of course this > >would be perfect material for a publication, but life is somehow too short, > >and our to-do list has remained too long, to leave us room for spending the > >necessary time to go through the process of putting a paper together. The > >truly important matter is to get our picture in front of the user community. > It would actually be good to have a proper paper! Indeed, everyone tells us so. However we think that the STARANISO server contains a much more detailed and comprehensive description of what the program does, with the associated terminology gathered into a Glossary, https://staraniso.globalphasing.org/staraniso_glossary.html than any paper would be likely to accommodate. The contents of a paper would be frozen in time, whereas the documentation can be continuously updated as the program evolves. The server has been a very good development tool for us, as we have been able to witness and investigate every problem that arose with user submissions much better than if they had occurred within a user's own working environment. This has provided us with a broader bandwidth through which to interact with users and evolve the program towards maximum utility to them and investigative functionality for us. If users do not read and assimilate the documentation on the server, what benefit would they derive from a paper? Perhaps a more conventional form of citation, and little else. In any case, we are under less pressure to "Publish or perish" (for us it is rather "Innovate, deliver and support, or perish") and this diminishes the priority of journal publications. Instead we have invested a lot of effort, attention and hard work into the activities of the PDBx/mmCIF Subgroup on Data Collection and Processing to enlarge the mmCIF dictionary https://www.rcsb.org/news/feature/61df48320fea311d064aa4de to make it possible to archive anisotropic statistics that can subsequently be used to create various types of input to re-refinement, using different sets of reflection data at different stages of scaling/merging (provided by e.g. autoPROC's Data_1_autoPROC_STARANISO_all.cif file), and different threshold values for various cut-off criteria - see increment 5.339 on 16 Feb 2021 in https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Index/ (more about this later). > STARANISO is a very valuable program. I do use it a lot, and have seen > great improvements in maps. But there are open questions. For us STARANISO and the use of its output in the refinement process are still very much work in progress, and their present forms make no claim to being definitive: all feedback about open questions is welcome. > First, there is always a danger associated with modifying experimental > data, so I'm not sure I like the default of STARANISO that leads to an > up-scaling of data along the weak direction(s). I'd rather see this > up-scaling implemented in the refinement program(s) which write out the > coefficients for map calculation. We are in agreement regarding the possible drawbacks of any sharpening of data, as is done in STARANISO. We took as a default what the UCLA DAS did and whose merits were documented in the paper by Strong et al., but any user who so wishes can switch it off (by unchecking a box in the submission page on the server, and by specifying an extra command-line flag in autoPROC). Doing this in the refinement programs would be a possibility, but the impact would not be limited to the writing out of map coefficients: prior to that, anisotropic input data would have to be used to compute e.g. R-factor values in spherical resolution shells, and would therefore have to be anisotropy-corrected. When all is said and done, the necessary corrections would amount to applying the same corrections to unsharpened input data as STARANISO applies to produce the sharpened input - which seems like "six of one and half a dozen of another". > Second, (from the POV of Randy Read not an open question IIUC) STARANISO > data should not be used for MR in Phaser. So be it - there is no implied obligation on us to only do things that can be used for MR in Phaser - but we have seen large numbers of cases of successful use of STARANISO output for MR (either through Phaser or Molrep), even though that might not be considered as the ideal data to be used there. Different types of output from autoPROC or the STARANISO server can easily be selected for the MR step if desired. > Third, I'd like to know if substructure solution works better with data > from STARANISO than with the original data. This is a different question altogether, but no practical instance of it has come our way so far. Data intended for use in experimental phasing usually have to have a high average value of I/sig(I), so that the small phasing differences derived from them can have significant delta/sig(delta) values. It would therefore be advisable to use a high value for the local average I/sig(I) to define the anisotropic cut-off surface. Your question would be likely to also apply to the sharpening step, in which case the outcome would depend on the relation of the minimum distance between atoms in the substructure to the diffraction limit in the worst direction(s) at that high threshold value. > Fourth, to me a (STARANISO default) cutoff of I/sigI at 1.2 is arbitrary. The term "arbitrary" seems an exaggeration, in the sense that it could not be 1,000,000. We would call it "sensibly low". Much careful work was done behind the scenes to pick this value as a sensible default. > Yes I know I can modify it, but given that the STARANISO calculation is > not instantaneous, I'd rather have a cutoff that is variable, and is > optimized for the given data and model - exactly what PAIREF does. Also, > the sigI values are not very reproducible across different data processing > programs. Regarding the general problem of providing default parameter values to programs that are not instantaneous: if you have some advice here, we would love to hear about it ;-) . We have been working on, and are close to having, precisely what you describe. When you allude to this very desirable program, but then say that this is "exactly what PAIREF does", we have to differ, as PAIREF does not deal with anisotropy. You probably meant "exactly what PAIREF does in the case of isotropic data", while the difference between these two statements is the heart of the matter we are discussing. Currently we do not optimise the threshold but we now have everything set up to do it in the very near future. Finally, regarding the non-reproducibility of sigI values between different data processing programs: this is an open question, that needs efforts both in (1) collecting data (e.g. according to high-multiplicity strategies) that are better suited to producing reliable estimates of sigI, and (2) understanding the root causes of the current non-reproducibility. In any case STARANISO has a user-selectable capability to define its cut-off surface on the basis of a local average of a criterion we call wCC_1/2 rather than I/sig(I) if so desired. Giving more details at this stage would be a deviation, but they are available to anyne interested. > > Now that the combined topics of PAIREF and anisotropy are being brought > >to the foreground of the community's attention, this seems like the perfect > >opportunity to present our analysis and position: what PAIREF achieves in > >terms of an upward revision of an initial isotropic resolution cut-off is > >likely to be achieved more straightforwardly by submitting the same data to > >the STARANISO server (or using it within autoPROC [3]); and the STARANISO > >output will have the advantage of being devoid of the large extra amount of > >purely noisy, uninformative data that are retained in the output from PAIREF > >according to its revised isotropic cut-off. > By saying so, you imply that the default cutoff that STARANISO uses gives > the best results. I don't agree, for the same reasons that apply to the > choice of high-resolution cutoffs for isotropic data - any fixed cutoff > based on some indicator is arbitrary (why is a I/sigI cutoff of 1.2 better > than 1.1 or 1.3 or ...? is there a proof?); the cutoff must depend on the > model (a bad model does not benefit from weak data); the cutoff must also > depend on the refinement program - e.g. phenix.refine does not take the > sigI into account. Paired refinement would be a better way because it > informs the user about the consequences (on the model and its R-values in > a fair comparison) of a certain cutoff - the cutoff does not have to be > based on resolution, but could be based on local I/sigI or the like. You have some valid points here, and we may well have been overly categorical in making our statement. On the other hand your use of the word "arbitrary" again seems rather excessive: a default of 1.2, when some further analysis might end up indicating 1.05, is not "arbitrary". The case of serial crystallography data would, admittedly, need to be considered separately. We agree that a coupling with refinement has to be part of the final picture, and STARANISO in its present state and mode of use is not our final statement. Anisotropy became such a major issue in MX with the sudden successes in crystallising membrane proteins that it seemed to invite urgent fresh thinking. A clear top priority was to break away from the use of an isotropic cut-off surface to avoid the loss of significant data in the best-diffracting direction(s) and eliminate the often extremely noisy data in the other(s) by adapting (and, we think, improving) what was available on the UCLA DAS. Doing this properly and fully entailed much questioning of basic thinking habits based on a purely isotropic vision of the world - such as there now being three different "diffraction limits" rather than a single number for "resolution"; or a need to consider two types of completeness in the context of an anisotropic cut-off. Our first aim was to make data processing extract all significant information from a set of diffraction images on the basis of *intrinsic* criteria only (i.e. involving only the data themselves) and without any form of self-censorship on the basis of what would be done downstream with the data. By self-censorship we meant e.g. the often-heard reasoning in the line of "I must cut the data off so that xxxx is at least yyyy, otherwise the reviewers will clobber me ...", resulting in loss of informative data. That first "downstream-agnostic" aim having led us to the current version of STARANISO, fine-tuning its choice of a cut-off threshold by *extrinsic* criteria (e.g. by interaction with refinement of a model) does look like the next step. Paired refinement (as a procedure) on STARANISO outputs for different values of its cut-off threshold is a near-immediately available option, but this cannot be described as "what PAIREF (the program) would do". On the minor matter that Phenix refinement does not use sigIs, this does not seem a valid reason for holding others back from using them. With modern high-multiplicity data collection strategies, there is no excuse for not producing sensible sigIs and not using them in refinement in the case of "Macrocrystals". Serial crystallography remains a different matter. This being said, there are still many open questions regarding not only which criteria and threshold to use in the definition of data cut-offs, but also in the quantitation of model improvements as these cut-offs are varied. However we feel that the groundwork and capabilities embodied in STARANISO provide a solid basis for the first part of that exploration, that it would make sense to also take as a basis for the second part. > > We would very much welcome feedback on this position: indeed we would > >like to *crowd-source* the validation (or refutation) of this conclusion. In > >our view, continuing to use the PAIREF procedure to revise an isotropic > >resolution cut off misses the point about the consequences of anisotropy. > Here too you imply that all datasets are anisotropic. An easy one, this one :-) : in our experience and that of many of our collaborators who carry out numerous structure determinations, anisotropy is present wherever it can occur. Even a crystal with cubic symmetry can show a non-spherical cut-off surface, even though the best-approximating ellipsoid to that surface is a sphere. If anisotropy is negligible, this conclusion can only be reached by first characterising it in 3D. Assuming isotropy and hoping for the best, rather than observing it as negligible anisotropy, will lead to a biased view of its prevalence. Membrane protein crystallography has flooded the field with instances of severe anisotropy for the past decade. Ultra-high resolution crystallography also makes anisotropy stand out, as the effect of any differences between the B values in the principal directions are amplified by the large values of (d_star)^2 . > >The only sensible use of a PAIREF-like procedure would be to adjust the > >cut-off threshold for the local average of I/sig(I) in STARANISO, whose > >default value is currently 1.2 but can be reset by the user through the Web > >server's GUI. We occasionally see datasets of very high quality for which > >the CC_1/2 value in the outermost shell stays above 0.6 or even 0.7, and it > >is quite plausible that further useful data could be rescued if the local > >I/sig(I) cut-off threshold were lowered below 1.2. > The way you phrase it appears to diminish the value of a PAIREF-like > procedure. To the contrary, I'd think it would be valuable and I'd like to > see exactly such a procedure. Quite the contrary: in the paragraph above we advocate, as the natural follow-up to our work embodied in STARANISO and your and your colleagues' work embodied in the PAIREF program, that the paired refinement procedure should be applied in the form of an optimisation of the anisotropic cut-off threshold in STARANISO though a paired refinement procedure. The matter of whether monitoring the Rfree values reached in refinements against STARANISO output data cut off at different threshold values is the best way of finding an optimum could be addressed, and other criteria could be experimented with. Our concerns with the Rfree criterion used in PAIREF in the presence of substantial anisotropy were voiced in the paragraph immediately below: > > Concerning Eleanor's view that noisy data can't hurt refinement because > >they are properly down-weighted by the consideration of e.g. Rfree values in > >resolution shells, we would point out that any criterion based on statistics > >in resolution shells will be polluted if the data are anisotropic and if the > >noisy data that STARANISO would reject are retained. That will result in > >excessive down-weighting of the significant data that STARANISO retains, > >hence in losing the information they contain. Perhaps this is a matter for > >later discussion, but the main idea is that retaining pure-noise data is not > >neutral in refinement, and that every "isotropic thinking habit" on which > >many views are based needs to be revisited. > My view here is that the existence of a "best" resolution cutoff (e.g. as > a minimum in Rfree) that we often see in paired refinement appears to prove > that the inclusion of data beyond that limit is somewhat detrimental to the > model. Meaning that inclusion of noise is not recommended - and emphasizing > the value of cutting the data in a smart(er) way. We are in total agreement here about the fact that "inclusion of noise is not recommended". Following that same logic in an anisotropic setting, one should then be concerned not only about not including more noise by extending the isotropic cut-off beyond the "best" resolution just found, but also about getting rid of all the noisy data that are included within the spherical cut-off surface corresponding to that resolution. This is what STARANISO will already have done ahead of time (by removing the red regions in The Picture) if its output is used in a paired refinement procedure. > To summarize what I want to say: a) I don't find your assessment of the > merits of PAIREF to be balanced. b) I think it would be worthwhile to > optimize the data cutoff based on local I/sigI or similar - so I'd wish > there were a combination of STARANISO and PAIREF (which you seem to see as > non-equitable alternatives). Regarding (a): we hope that the effort we put into not leaving a stone unturned in this discussion will help you reconsider this statement. Regarding (b): we have mentioned in many places in this message, and had already indicated the same in the previous one, that your wish can almost "instantly" turned into reality by running paired refinements on STARANISO outputs corresponding to a set of trial values for its cut-off threshold. This would enable the question of whether sharpening the data in the weak directions is beneficial, or detrimental, or neutral to the process. We wrote "running paired refinements" above, rather than "running PAIREF" because the latter may require a number of substantial changes to adapt it to new operating conditions such as the non-spherical cut-off surface. > One more word: sorry, I don't have the time currently to continue this > thread from my side. The length of our reply may mean that you will never have time to read it ;-) but we hope that what we wrote will be of interest to the broader CCP4BB readership and will clarify many of the matters we touched upon. Finally, we thank you for raising so many questions in so few words, that stimulated us to answer them and to review all matters associated with them as thoroughly as seemed necessary. Hesitation wasn't experienced at any stage of writing, deviation was hopefully kept to a minimum, but repetition couldn't be avoided. With best wishes, Ian, Clemens, Claus and Gerard. -- > Best wishes, > Kay > > > > > > With best wishes, > > > >Clemens, Claus, Ian and Gerard. > > > > > >[1] Tickle, I.J., Flensburg, C., Keller, P., Paciorek, W., Sharff, A., > > Vonrhein, C., Bricogne, G. (2018). STARANISO. Cambridge, United > > Kingdom: Global Phasing Ltd. > > > > https://www.jiscmail.ac.uk/cgi-bin/wa-jisc.exe?A2=ind1806&L=CCP4BB&O=D&P=3971 > > > >[2] https://staraniso.globalphasing.org/ > > > >[3] https://doi.org/10.1107/s0907444911007773 > > https://www.globalphasing.com/autoproc/ > > > > > >######################################################################## > > > >To unsubscribe from the CCP4BB list, click the following link: > >https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > > >This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing > >list hosted by www.jiscmail.ac.uk, terms & conditions are available at > >https://www.jiscmail.ac.uk/policyandsecurity/ > > > > ######################################################################## > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing > list hosted by www.jiscmail.ac.uk, terms & conditions are available at > https://www.jiscmail.ac.uk/policyandsecurity/ ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/