And there was Crambin to 0.48Å (I’ll leave it to others to argue whether or not 
cramin is a protein, since it has “only” 46 amino acids) where (from memory, I 
haven’t read the paper for at least 20 nyears…) they modelled multiple water 
networks.

3NIR, for reference.

Harry


> On 20 Mar 2023, at 10:20, Eleanor Dodson 
> <0000176a9d5ebad7-dmarc-requ...@jiscmail.ac.uk> wrote:
> 
> Thank you for such a careful analysis of modelling a "true" structure. You 
> should publish this James.
> 
> It shows amongst other things, how R factors depend on our modelling of 
> solvent which is not represented as individual atoms (And also I think on how 
> the scales are derived between observation and calculation.)
> Years ago someone refined vitamin B12 against high resolution (0.6A?) data. 
> There is about 20-25% solvent volume I think..t was clear in the maps that 
> there were partially occupied networks of water which extended throughout the 
> lattice. This is probably true for proteins as well, and must affect the 
> conformations of surface sidechains? 
> Eleanor
> The B12creference ...
> 
> Biophys J. 1986 Nov; 50(5): 967–980.
> doi: 10.1016/S0006-3495(86)83537-8
> PMCID: PMC1329821
> PMID: 3790697
> Water structure in vitamin B12 coenzyme crystals. II. Structural 
> characteristics of the solvent networks.
> 
> H Savage
> 
> On Sun, 19 Mar 2023 at 19:37, James Holton <jmhol...@lbl.gov> wrote:
> They say one test is worth a thousand expert opinions, so I tried my hand at 
> the former.
> 
> The question is: what is the right way to treat disordered side chains?:
> a) omit atoms you cannot see
> b) build them, and set occupancy to zero
> c) build them, and "let the B factors take care of it"
> d) none of the above
> 
> The answer, of course, is d).
> 
> Oh, c'mon.  Yes, I know one of a,b, or c is what you've been doing your whole 
> life. I do it too.  But, let's face it: none of these solutions are perfect.  
> So, the real question is not which one is "right", but which is the least 
> wrong?  
> 
> We all know what is really going on: the side chain is flapping around. No 
> doubt it spends most of its time in energetically reasonable but nevertheless 
> numerous conformations.  There are 41 "Favorable" rotamers for Lys alone, and 
> it doesn't take that many to spread the density thin enough to fall below the 
> classical 1-sigma contour level. The atoms are still there, they are still 
> contributing to the data, and they haven't gone far. So why don't we "just" 
> model that?  Already, I can hear the cries of "over-fitting!" and 
> "observations/parameters!", "model bias!", and "think of the children!"  
> Believe it or not, none of these are the major issue here. Allow me to 
> demonstrate:
> 
> Consider a simple case where we have a Lys side chain in ten conformers. I 
> chose from popular rotamers, but evenly spread. That is, all 10 conformers 
> have an occupancy of 0.10, and there is a 3-3-4 split of chi1 values between 
> minus, plus and trans.  This will give the maximum contrast of density 
> between CB and CG.  Let us further require that there is no strain in this 
> ground-truth. No stretched bonds, no tortured angles, no clashes, etc.  Real 
> molecules don't occupy such high-energy states unless they absolutely have 
> to.  Let us further assume that the bulk solvent works the way phenix models 
> it, which is a probe radius of 1.1 A for both ions and aliphatics and a 
> shrink radius of 0.9.  But, instead of running one phenix.fmodel job, I ran 
> ten: one for each conformer (A thru J).  To add some excitement, I moved the 
> main chain ~0.2 A in a random direction for each conformer. I then took these 
> ten calculated electron density maps (bulk solvent and all) and added them 
> together to form the ground truth for the following trials. Before 
> refinement, I added noise consistent with an I/sigma of 50 and cut the 
> resolution at 2.0 A. Wilson B is 50:
> 
> CCtrue   Rwork%  Rfree%   fo-fc(sigma)   description
> 0.8943     9.05   10.60      5.9         stump at CB
> 0.9540     9.29   11.73      6.0         single conformer, zero occupancy
> 0.9471    10.35   15.04      5.1         single conformer, full occupancy, 
> refmac5
> 0.9523     9.78   15.61      4.9         single conformer, full occupancy, 
> phenix.refine
>    
> So, it would appear that the zero-occupancy choice "wins", but by the 
> narrowest of margins.  Here CCtrue is the Pearson correlation coefficient 
> between the ground-truth right-answer electron density and the 2fofc map 
> resulting from the refinement.  Rwork and Rfree are the usual suspects, and 
> fo-fc indicates the tallest peak in the difference map. Refinement was with 
> refmac unless otherwise indicated. I think we often forget that both phenix 
> and refmac restrain B factor values, not just through bonds but through 
> space, and they use rather different algorithms. Refmac tries to make the 
> histogram of B factors "look right", whereas phenix allows steeper gradients. 
> I also ran all 10 correct rotamers separately and picked the one with the 
> best CCtrue to show above. If you instead sort on Rfree (which you really 
> shouldn't do), you get different bests, but they are not much better (as low 
> as 10.5%).  So, the winner here depends on how you score.  CCtrue is the best 
> score, but also unfortunately unavailable for real data.
> 
>   It is perhaps interesting here that better CCtrue goes along with worse 
> Rfree. This is not what I usually see in experiments like this. Rather, what 
> I think is going on here is the system is frustrated. We are trying to fit 
> various square pegs into a round hole, and none of them fit all that well. 
> 
> In all cases here the largest difference peak was indicating another place to 
> put the Lys, so why not build into that screaming, 6-sigma difference peak?  
> Here is what happens when you do that:
> 
> CCtrue   Rwork%  Rfree%   fo-fc(sigma)   description
> 0.8943     9.05   10.60      5.9         stump at CB
> 0.9580     9.95   11.60      6.4         stump at CG
> 0.9585    10.20   12.29      6.2         stump at CG, all 10 confs
> 0.9543    10.61   12.24      5.3         stump at CD, all 10 confs
> 0.9383    10.69   14.64      4.1         stump at CE, all 10 confs
> 0.9476     9.66   13.48      4.6         all atoms, all 10 confs
> 0.9214     7.09    11.8      5.6         three conformers (worst of 120 
> combos)
> 0.9718     6.53    8.55      4.3         three conformers (best of 120 combos)
> 0.9710     7.17    9.44      6.1         two conformers (best of 45 combos)
> 0.9471    10.35   15.04      5.1         single conformer (best of 10 choices)
> 
> If I add one CG, the other two chi1 positions light up.  So, I tried building 
> in all 10 true CG positions, and let the refinement decide what to do with 
> them. The clear indication was that a CD should be added. After adding all 
> the CDs, the difference peaks were weaker, but still indicating more atoms 
> were needed.  Rwork and Rfree, however, tell the opposite story.  They get 
> worse the more atoms you add.  CCtrue, on the other hand, was best when 
> cutting everything after CG.  Why is that?  Well, every time you add another 
> atom you fill in the difference density, but then that atom pushes back the 
> bulk solvent model that was filling in the density for the next atom.  The 
> atom-to-solvent distance is roughly twice that of a covalent bond.  So again, 
> square pegs and round holes.  
> 
> Three conformers coming out as the winner may be because it is a selective 
> process with a noisy score. In the ground truth there are 10 conformers at 
> equal occupancy, so no one triplet is really any better than any other. 
> However, one has a density shape that fits better than other combos. My 
> search over all possible quartets is still running.
> 
> But what if we got the solvent "right"?  Well, here is what that looks like:
> 
> CCtrue   Rwork%  Rfree%   fo-fc(sigma)   description
> 0.9476     9.66   13.48      4.6         all atoms, all confs, refmac defaults
> 0.9696     6.15    8.88      3.7         all atoms, all confs, phenix.refine
> 0.9825     0.80    0.89      3.9         all atoms, all confs, true solvent   
> 0.9824     0.92    1.26      7.3         true model, minus one H atom from 
> ordered HIS side chain
> 
> You can see that the default solvent of phenix.refine fares better than 
> refmac here, but since I generated the solvent with phenix refine it may have 
> an unfair advantage. Nevertheless, providing the "true solvent" here is quite 
> a striking drop in R factors.  This is not surprising since this was the last 
> systematic error in this ground truth.  In all cases, I provided the true 
> atomic positions at the start of refinement, so there was no confusion about 
> strain-inducing local minima, such as which rotamer goes with which main 
> chain shift.  And yes, you can provide arbitrary bulk solvent maps to refmac5 
> using the "Fpart" feature.  I've had good luck with real data using bulk 
> density derived form MD simulations.
> 
> What is more, once the R factors are this low I can remove just one hydrogen 
> atom and it comes back as a 7.3-sigma difference peak. This corresponds to 
> the protonation state of that His.  This kind of sensitivity is really 
> attractive if you are looking for low-lying features, such as 
> partially-occupied ligands.  Some may pooh-pooh R factors as "cosmetic" 
> features of structures, but they are, in fact, nothing more or less than the 
> % error between your model and your data.  This % error translates directly 
> into the noise level of your map.  At 20% error there is no hope whatsoever 
> of seeing 1-electron changes. This is because hydrogen is only 17% of a 
> carbon.  But 3-5% error, which is a typical experimental error in 
> crystallographic data, anything bigger than one electron is clear.
> 
> -James Holton
> MAD Scientist
> 
> 
> 
> On 3/18/2023 2:10 PM, Nicholas Pearce wrote:
>> Not stupid, but essentially the same as modelling alt confs, though would 
>> probably give more overfitting. Alt confs can easily be converted to an 
>> ensemble (if done properly…). 
>> 
>> Thanks,
>> Nick
>> 
>> ———
>> 
>> Nicholas Pearce
>> Assistant Professor in Bioinformatics & DDLS Fellow
>> Linköping University
>> Sweden
>> 
>> From: CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK> on behalf of benjamin bax 
>> <ben.d.v....@gmail.com>
>> Sent: Saturday, March 18, 2023 10:07:26 PM
>> To: CCP4BB@JISCMAIL.AC.UK <CCP4BB@JISCMAIL.AC.UK>
>> Subject: Re: [ccp4bb] To Trim or Not to To Trim
>>  
>> Hi,
>> Probably a stupid question. 
>> Could you multiply a, b and c cell dimensions by 2 or 3 (to give 8 or 27 
>> structures) and restrain well defined parts of structure to be ‘identical’ ? 
>> To give you a more NMR like chemically sensible ensemble of structures?
>> Ben
>> 
>> 
>> > On 18 Mar 2023, at 12:04, Helen Ginn <ccp...@hginn.co.uk> wrote:
>> > 
>> > Models for crystallography have two purposes: refinement and 
>> > interpretation. Here these two purposes are in conflict. Neither case is 
>> > handled well by either trim or not trim scenario, but trimming results in 
>> > a deficit for refinement and not-trimming results in a deficit for 
>> > interpretation.
>> > 
>> > Our computational tools are not “fixed” in the same way that the standard 
>> > amino acids are “fixed” or your government’s bureaucracy pathways are 
>> > “fixed”. They are open for debate and for adjustments. This is a fine 
>> > example where it may be more productive to discuss the options for making 
>> > changes to the model itself or its representation, to better account for 
>> > awkward situations such as these. Otherwise we are left figuring out the 
>> > best imperfect way to use an imperfect tool (as all tools are, to varying 
>> > degrees!), which isn’t satisfying for enough people, enough of the time.
>> > 
>> > I now appreciate the hypocrisy in the argument “do not trim, but also 
>> > don’t model disordered regions”, even though I’d be keen to avoid 
>> > trimming. This discussion has therefore softened my own viewpoint.
>> > 
>> > My refinement models (as implemented in Vagabond) do away with the concept 
>> > of B factors precisely for the anguish it causes here, and refines a 
>> > distribution of protein conformations which is sampled to generate an 
>> > ensemble. By describing the conformations through the torsion angles that 
>> > comprise the protein, modelling flexibility of a disordered lysine is 
>> > comparatively trivial, and indeed modelling all possible conformations of 
>> > a disordered loop becomes feasible. Lysines end up looking like a frayed 
>> > end of a rope. Each conformation can produce its own solvent mask, which 
>> > can be summed together to produce a blurring of density that matches what 
>> > you would expect to see in the crystal.
>> > 
>> > In my experience this doesn’t drop the R factors as much as you’d assume, 
>> > because blurred out protein density does look very much like solvent, but 
>> > it vastly improves the interpretability of the model. This also better 
>> > models the boundary between the atoms you would trim and those you’d leave 
>> > untrimmed, by avoiding such a binary distinction. No fear of trimming and 
>> > pushing those errors unseen into the rest of the structure. No fear of 
>> > leaving atoms in with an inadequate B factor model that cannot capture the 
>> > nature of the disorder.
>> > 
>> > Vagabond is undergoing a heavy rewrite though, and is not yet ready for 
>> > human consumption. Its first iteration worked on 
>> > single-dataset-single-model refinement, which handled disordered side 
>> > chains well enough, with no need to decide to exclude atoms. The heart of 
>> > the issue lies in main chain flexibility, and this must be handled 
>> > correctly, for reasons of interpretability and elucidating the biological 
>> > impact. This model isn’t perfect either, and necessitates its own 
>> > compromises - but will provide another tool in the structural biology 
>> > arsenal.
>> > 
>> > —-
>> > 
>> > Dr Helen Ginn
>> > Group leader, DESY
>> > Hamburg Advanced Research Centre for Bioorganic Chemistry (HARBOR)
>> > Luruper Chaussee 149
>> > 22607 Hamburg
>> > 
>> > ########################################################################
>> > 
>> > To unsubscribe from the CCP4BB list, click the following link:
>> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0
>> > 
>> > This message was issued to members of 
>> > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0,
>> >  a mailing list hosted by 
>> > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0,
>> >  terms & conditions are available at 
>> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0
>> 
>> ########################################################################
>> 
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qb0fv349eLSwriyNUYdYjYw7FvjshVdZcJ%2FfUO0L2UI%3D&reserved=0
>> 
>> This message was issued to members of 
>> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HClu5jptgnNShWKqRbtahao9debmn7YF2LDjS%2F53Ook%3D&reserved=0,
>>  a mailing list hosted by 
>> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aGjYsM25olFLtXkOd9XLOPMaLiafkInYWQgk%2BoT80YE%3D&reserved=0,
>>  terms & conditions are available at 
>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F&data=05%7C01%7Cnicholas.pearce%40LIU.SE%7Cb01b3fe2d210435bd43108db27f4d9fc%7C913f18ec7f264c5fa816784fe9a58edd%7C0%7C0%7C638147704962813881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KftyRhu9E%2F5FSnP%2B1dly6tUZc%2Bmg5x%2FWzoubM0RAZUI%3D&reserved=0
>> 
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
>> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
> 

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Reply via email to