Re: [ccp4bb] About Staraniso

Clemens Vonrhein Mon, 12 Feb 2024 08:24:51 -0800

Dear Arpita,

> Apologies if the below query seems very naive!


Your query is not at all naive, it is very probing.  Sorry for the
necessarily long reply to your questions - but there are a number of topics
you raise where we think a large amount of confusion still exists.  Please
note that this reply represents /our/ view of things only of course.

> This is to query on the consensus to use Staraniso for pdb submission. We
> have solved a structure previously at 2.3 A resolution.

So you had a dataset where you decided on a sphere in reciprocal space with
a radius of 2.3A as a cut-off surface - based on some kind of local
analysis that convinced you that all the measured reflections within that
sphere (i.e. 2.3A and lower) are observed and should be kept, while all
measured reflections outside that sphere should be regarded as unobserved
and can be discarded as pure noise.

> The same data (after reindexing the diffraction images in autoPROC) and
> after reprocessing by ellipsoidal scaling in Staraniso gave structure at
> ~2.16 A.

OK, first some clarification:

  * The scaling of the unmerged reflection data in autoPROC (using AIMLESS)
    is neither spherical nor ellipsoidal in itself: it uses the data as it
    is with the typical scale parameterisation in AIMLESS, i.e. a scale k
    and an image B-factor (plus some absorption), all with default
    smoothing.

    This then leads to two output reflection files:

     aimless_alldata_unmerged.mtz  = Scaled and unmerged reflections without 
cut-off.
     aimless_alldata.mtz           = Scaled and merged reflections without 
cut-off.

  * The latter (scaled and merged reflection data without any cut-off) is
    then given to STARANISO to do the following:

    (a) Compute various local statistics that are then used to define a
        cut-off surface.

    (b) Assume that all reflections within that cut-off surface should be
        kept (and could have been observed) and all those outside should be
        ignored.

    ==> See how that is extremely similar to the type of analysis you did
        with the initial 2.3A data?  One type of analysis (local 1D-shells
        of data in d*) lead to an isotropic sphere as a cut-off surface,
        while another (local 3D-spheres in reciprocal space) lead to an
        anisotropic cut-off surface.

        Remember that "anisotropic" just means "not isotropic" - it doesn't
        mean "ellipsoidal" (diffraction from a cubic crystal can be
        anisotropic since the [100], [110] and [111] directions have quite
        different properties, yet attempts to fit an ellipsoid to it will
        produce a sphere).  The cut-off surface assigned this way by
        STARANISO can have any shape really (including being a sphere)
        because the analysis via local spheres doesn't assume/enforce
        isotropy - while the analysis via spherical shells does.

        So up to that point there is no difference really between the two
        approaches: using a criterion to define a cut-off surface and
        considering data within the surface as observable and data outside
        as unobservable. It is only the assumptions on which the criterion
        is based that differ: one assumes the data is isotropic, while the
        other doesn't.

    ==> The notion of "resolution" is a bit complicated in general here: if
        your crystal diffracted better in some directions than in others, a
        better description is the use of "diffraction limit" in some
        directions - e.g.  defined as a the principal axes of an ellipsoid
        fitted to the cut-off surface. This is what autoPROC/STARANISO
        provides.

   (c) Analyse the anisotropic fall-off in intensity of the data within the
       cut-off surface to derive anisotropic correction factors and apply
       them to the data.

       This is similar to the anisotropic scaling a refinement program
       would perform using the current model as a reference (to
       anisotropically scale the observed data to the model).  Here we
       apply an internal anisotropic scaling, without a reference to any
       model.

> The previously solved structure did not have significant anisotropy
> according to Aimless, so anisotropic scaling was not performed that time.

See above: you most likely did use anisotropic scaling during refinement
(with the model as the reference).

Please note that using AIMLESS alone is not the best way to detect
anisotropy; that is not its main purpose.  As far as we know, it looks only
along the crystal axes for anisotropy (whereas STARANISO looks in all
directions). That means it will not detect anisotropy eigenvectors lying
close to diagonals as can happen in monoclinic (a*-c* plane only since an
anisotropy eigenvector is constrained to be parallel to the b* axis) and
triclinic lattices.  So if AIMLESS says there is significant anisotropy you
can believe it; OTOH if it says no anisotropy was detected you should
definitely perform further checks.  Absence of evidence is not evidence of
absence!  In higher-symmetry lattices all eigenvectors are constrained to
be parallel to crystal axes so this problem doesn't arise and AIMLESS
should detect anisotropy correctly in those cases.

> The overall spherical completeness of Staraniso structure is low (~73%)
> while Ellipsoidal completeness is ~94%.

Let's see what "completeness" means: what percentage of observable
reflections did we actually observe?  So the important point is to define
"observable" ... which you have done above already! The cut-off surface
defines the region in reciprocal space that you deemed "observable",
i.e. either a sphere (assuming isotropy) or a general cut-off surface (that
can be simplified through a fitted ellipsoid).

If you want to judge the completeness for data analysed by STARANISO, you
can /not/ use the spherical completeness: the completeness computation
needs to be done with the cut-off surface employed - which in the case of
STARANISO data is the anisotropic cut-off surface (simplified through a
fitted ellipsoid).  So you /have/ to use the ellipsoidal completeness as a
measure here.

Using the spherical completeness for data that went through STARANISO is
the same as if one pretends a crystal would diffract to 1.0A when it only
ever gives observable intensities to say 1.5A - but still compute
completeness to the 1.0A limit. The overall completeness would be
ridiculously low and no-one would choose those two different cut-off
surfaces (one representing actual data, the other some over-optimisitic
assumption) and do that kind of computation, right?  In the same way, we
shouldn't pretend that an anisotropically diffracting crystal could provide
us with all observations within a sphere in reciprocal space: there are no
observations in certain directions (in the same way that there aren't any
observations at 1.0A if the crystal only diffracts to 1.5A).

The bottom line is: if the ellipsoidal completeness is significantly higher
than the spherical one it means that there is significant anisotropy.

> Parallel isotropic scaling gives structure with 99.6% completeness (but
> 2.3 A resolution).

Remember that completeness computations only look at Miller indices
(reciprocal lattice points): it is the responsibility of whoever calls that
computational step to provide only reciprocal lattice points with actual
observations ... we are after all interested in knowing what fraction of
possible data did we collect (and not how many HKL values we have in a
file).

It might be easier to visualise those very simple concepts by looking at

  https://staraniso.globalphasing.org/anisotropy_about.html

> The statistics (R merge and others) are better for Staraniso structure
> (also benefited from removing specific frames with high R merge as
> indicated by Staraniso).

Are you sure you removed images based on Rmerge values in STARANISO? -
autoPROC is doing a fair bit of analysis to determine poor image ranges
(not based on R-values though) ... so maybe you mean that feature?

> Also the interatomic distances in regions of interest in the staraniso
> structure is on par with parallel molecular dynamics simulation data.

So your model is "better" in terms of explaining some other, externally
determined results when using autoPROC/STARANISO data?  Very good.


> The questions are:
>
> 1. Can the Staraniso structure be submitted to pdb saying reprocessed
> structure at higher resolution (through Staraniso)?

Your model is more meaningful as judged by external information (MD
simulations) ... so why should one not deposit the data as-is when it
clearly was instrumental in providing you that added information?
Remember: anything happening in STARANISO is done without seeing anything
of your model (so no model bias is possible at all)!  It would be
effectively a new deposition with different data, as opposed to a
re-refinement of the structure with the same data, so yes.  Unless you plan
to obsolete the first deposition, you should make a reference to it
explaining how it's related.

As part of autoPROC/STARANISO processing, we are providing a
deposition-ready mmCIF file that contains multiple datablocks - since
different downstream programs and methods might require different stages of
data processing and analysis. For the full history and background, please
see:

 * "Introduce Global Phasing Extensions to v50 dictionary" (February 2021):
   
https://github.com/wwpdb-dictionaries/mmcif_pdbx/commit/81a037c4bac0ccebdd8772717857d3527cb47db3
 
 * "Improved support for extended PDBx/mmCIF structure factor files"
   (January 2022): https://www.rcsb.org/news/feature/61df48320fea311d064aa4de

 * https://www.globalphasing.com/buster/wiki/index.cgi?DepositionMmCif

 * https://www.wwpdb.org/deposition/preparing-pdbx-mmcif-files

> 2. What is the factor more important for a structure: completeness
> (spherical vs ellipsoidal) or R statistics?

A model is better if the X-Ray data it is refined against contains all the
information and leaves out pure noise. Obviously, that is an ideal that is
hard to achieve - so during processing we try to define a cut-off surface
that will include most signal and exclude most noise.  That approach is
taken both for isotropically diffracting crystals and for anisotropically
diffracting ones.

Completeness (see discussion above) is important, handling of poor image
ranges can help (crystal moving out of beam etc), R-values are just numbers
(especially ignore Rmerge!), <I/sigI> and CC_1/2 (the latter except for
significantly anisotropic data) are good metrics ... but again: these are
just numbers.

If the density is clearer because the model refinement works better /and/
the model interpretation is more meaningful as judged by external results:
that's all that matters, right?  Everything else is just numbers ... but if
you are looking at numbers: the ellipsoidal completeness out to the cut-off
based on the local average I/sigma(I) is far more important than the
merging stats (Rs and CC_1/2), as seems to be confirmed by your gratifying
observation that the interatomic distances in regions of interest from the
data processed by autoPROC+STARANISO accord with expectations. The merging
statistics are not very reliable indicators of data quality; CC_1/2
especially can sometimes be unreliable as a measure of data quality in the
presence of significant anisotropy because the common anisotropy between
the half-sets can create a dominant contribution to the correlation between
them - even if the data themselves don't agree particularly well.

> 3. Why is the extra resolution not detected during indexing by iMOSFLM or
>    XDS (using default setup)? The indexed outputs of either of them did
>    not give extra resolution (through anisotropic scaling) in Staraniso,
>    although it said some data was missing.

It is not clear what you mean here: iMOSFLM and XDS do not apply
diffraction cut-offs to the data by default - unless those defaults have
been changed by whatever program/tool is driving those programs (maybe done
via some automatic processing pipelines at a synchrotron beamline?).  You
might want to check again how (and by what system) those programs were run:
there is nothing to prevent those to also output the scaled+merged data
without any cut-off (which then could go into STARANISO).

If STARANISO (through the STARANISO webserver) complained about missing
data, then some kind of cut-off was applied before the data was given to
STARANISO.  Most likely an isotropic cut-off (a sphere as the cut-off
surface) was used - resulting in exclusion of observable data in the
well-diffracting direction(s) and inclusion of noise in the poorly
diffracting direction(s).

> 4. Is there any option for using all reflections detection (like
>    autoPROC) in iMOSFLM or XDS?

It is not clear what you mean by 'all-reflections detection':
MOSFLM/iMOSFLM and XDS already output all valid reflections. Maybe you
could clarify that point further?

Hope this helps, get back to us if not.

Regards

Clemens, Ian & Gerard (for the autoPROC+STARANISO team)

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] About Staraniso

Reply via email to