Re: [ccp4bb] Structure prediction - waiting to happen

Guenter Fritz Fri, 05 May 2023 05:27:43 -0700

Dear all,

taking AlphaFold models for " true" experimental structures seems tobecome a serious problem.I am just returning from a meeting (not a structural biology meeting)and saw one model after other. And the non-structural biologist usedterms like "we calculated a structure" or "a AlphaFold crystalstructure" or "the structure was accurate, is was all blue".Alphafold models were used to predict ion channels, electron transferpathways, enzyme mechanisms, and yes, not tested by experiments. Wideranging conclusions were drawn on these pure models, which I would notdare to draw on limited experimental data.

There is something going severely wrong.

And don't get me wrong, I think AlphaFold and other prediction softwareis great to create testable models (like MR models) or try to figure outhow proteins might assemble and so on.But I get the impression that too many of our colleagues got theimpression that pressing a button replaces experiments.Seeing that grants are rejected by such arguments is alarming and weshould do something.


Best wishes,
Guenter

Very sorry to hear about your grant. I've been there. It is crushingto be rejected, and frustrating when the reason given is ... wrong.
My journalist friends wonder why scientists don't like talking tojournalists. This is why. I remember when the first results fromXFELs were published, and it was immediately declared that there wasno longer a need for NASA, whose sole purpose (apparently) was to growbigger and better crystals in space. (?!) I find the idea thatAlphaFold has eliminated the need to solve any more structures equallyludicrous.
I think the best analogy for what has happened in structural biologyis the same impact a Star Trek style "transporter device" would haveon your daily commute. Except this "transporter" is only accurateenough to get you within a mile or two of your house. Most of thetime. Don't worry, its not going to beam you inside a rock or into thesky, as it was trained on data with good Clashscores (we think). But,you are on your own getting the rest of the way home. This "Last Mile"of transportation networks is actually the most challenging, andexpensive, but also the most critical. In structural biology, the"Last Angstrom" between prediction and actuality is equally important,but also fraught with difficulty. It may seem like a short distance,until you have to walk it. So, despite amazing progress, it is stillpremature to dismantle infrastructure, and definitely a bad idea tonail your front door shut.
Personally, I see this structure prediction revolution as nothing morenor less than the fruition of Structural Genomics. It started in thefinal days of the 20th century. I was there! The stated goal of thatworldwide initiative was to create the data set that would be neededby some future (at the time) homology modelling technology to doexactly what AlphaFold does: get us "close enough". And then GregPetsko asked: what is "close enough"? He called it "The Grailproblem". By what metric do you declare victory? He made an excellentsuggestion:
"But there is an obvious method of evaluation that will allow anystructure prediction method to be assessed. It is simply to demandthat the method produce a model that can be used to solve thecorresponding protein crystal structure by the method of molecularreplacement."
-Greg Petsko - June 9, 2000
https://doi.org/10.1186/gb-2000-1-1-comment002
This is the thing that just changed. Structure prediction has finallycrossed the "G-P threshold". Not 100% of the time, but impressivelyoften now, the predictions can be used for MR. This is a massivelyuseful tool! Not the end of the field, but rather the beginning of anexciting new era where success rates skyrocket.
Scores like the GDT used in CASP were developed with this GrailProblem criterion in mind, and I think that is what John Moult andothers meant when they said things that got quoted like this:"Scores above 90 on the 100-point scale are considered on par withexperimental methods, Moult says."
https://www.science.org/doi/full/10.1126/science.370.6521.1144
Meaning that the predicted models work as search models for MR aboutas often as search models derived from homologous (and yes,"experimentally determined") structures. A GDT of 100 does NOT meanthe model is better than the data. That is not even how it works.
But, unfortunately, this seems to have gotten paraphrased andsensationalized:
"generally considered to be competitive with the same results obtainedvia experimental methods"
https://www.sciencealert.com/ai-solves-50-year-old-biology-grand-challenge-decades-before-experts-predicted
"software predictions finally match structures calculated fromexperimental data"
https://www.science.org/doi/full/10.1126/science.370.6521.1144

"comparable in quality to experimental structures"
https://www.nature.com/articles/d41586-020-03348-4

"accuracy comparable to laboratory experiments"
https://www.bbc.com/news/science-environment-55133972

<sigh>
The only kind of diffraction where prediction is better thanexperiment is that of monoatomic gasses. These curves can be derivedvery accurately and completely from fundamental constants of physics.This is where those tables of atomic scattering factors used byrefinement programs come from. For a while, the experimentallymeasured curves were used, but once Hartree, Fock, Slater, Cromer,Mann and others worked out how to do the self-consistent fieldcalculations accurately, by the late 1960s the calculated form factorssupplanted the measured ones.
You might also say that for "small molecule" crystals the models arebetter than the data. Indeed, the CSD did not require experimentaldata to be deposited until fairly recently. The coordinates wereconsidered more accurate than the intensities because publicationrequirements for chemical crystallography R factors are low enough tobe dominated by experimental noise only. Nevertheless, despite thephase problem being cracked by direct methods in the 1980s, your localchemistry department has yet to shut down their diffractometer. Why?Because they need it. And for macromolecular structures, thesystematic errors between refined coordinates and their correspondingdata are about 4-5x larger than experimental error. So, don't deleteyour image data! Not for a while yet.
-James Holton
MAD Scientist


On 4/1/2023 7:57 AM, Subramanian, Ramaswamy wrote:
Ian,

Thank you.  This is not an April fools..
Rams
subra...@purdue.edu
On Apr 1, 2023, at 10:46 AM, Ian Tickle <ianj...@gmail.com> wrote:
---- *External Email*: Use caution with attachments, links, orsharing data ----
Hi Ramaswamy
I assume this is an April Fool's but it's still a serious questionbecause many reviewers who are not crystallographers or electronmicroscopists may not fully appreciate the difference currentlybetween the precision of structures obtained by experimental andpredictive methods, though the latter are certainly catching up. The answer of course lies in the mean co-ordinate precision, relatedto the map resolution.
Quoting https://people.cryst.bbk.ac.uk/~ubcg05m/precgrant.html :
"The accuracy and precision required of an experimentally determinedmodel of a macromolecule depends on the biological questions beingasked of the structure. Questions involving the overall fold of aprotein, or its topological similarity to other proteins, can beanswered by structures of fairly low precision such as thoseobtained from very low resolution X-ray crystal diffraction data [orAlphaFold]. Questions involving reaction mechanisms require muchgreater accuracy and precision as obtained from well-refined,high-resolution X-ray structures, including proper statisticalanalyses of the standard uncertainties (/s.u.'s/) of atomicpositions and bond lengths.".
According to https://www.nature.com/articles/s41586-021-03819-2 :
The accuracy of AlphaFold structures at the time of writing (2021)was around 1.0 Ang. RMSD for main-chain and 1.5 Ang. RMSD forside-chain atoms and probably hasn't changed much since. This isdescribed as "highly accurate"; however this only means thatAlphaFold's accuracy is much higher in comparison with otherprediction methods, not in comparison with experimental methods. Also note that AlphaFold's accuracy is estimated by comparison withthe X-ray structure which remains the "gold standard"; there's noway (AFAIK) of independently assessing AlphaFold's accuracy orprecision.
Quoting https://scripts.iucr.org/cgi-bin/paper?S0907444998012645 :
"Data of 0.94 A resolution for the 237-residue protein concanavalinA are used in unrestrained and restrained full-matrix inversions toprovide standard uncertainties sigma(r) for positions and sigma(l)for bond lengths. sigma(r) is as small as 0.01 A for atoms with lowDebye B values but increases strongly with B."
There's a yawning gap between 1.0 - 1.5 Ang. and 0.01 Ang.! PerhapsAlphaFold structures should be deposited using James Holton's newPDB format (now that is an April Fool's !).
One final suggestion for a reference in your grant application:https://www.biorxiv.org/content/10.1101/2022.03.08.483439v2 .
Cheers

-- Ian
On Sat, 1 Apr 2023 at 13:06, Subramanian, Ramaswamy<subra...@purdue.edu> wrote:
    Dear All,

    I am unsure if all other groups will get it - but I am sure this
    group will understand the frustration.

    My NIH grant did not get funded.  A few genuine comments - they
    make excellent sense.  We will fix that.

    One major comment is, “Structures can be predicted by alpfafold
    and other software accurately, so the effort put on the grant to
    get structures by X-ray crystallography/cryo-EM is not justified.”

    The problem is when a company with billions of $$s develops a
    method and blasts it everywhere - the message is so pervasive…

    *Question: I*s there a canned consensus paragraph that one can
    add with references to grants with structural biology
    (especially if the review group is not a structural biology
    group) to say why the most modern structure prediction programs
    are not a substitute for structural work?

    Thanks.


    Rams
    subra...@purdue.edu




    ------------------------------------------------------------------------

    To unsubscribe from the CCP4BB list, click the following link:
    https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
    <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
------------------------------------------------------------------------

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
------------------------------------------------------------------------

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] Structure prediction - waiting to happen

Reply via email to