Re: [ccp4bb] what to do with disordered side chains
On 4/4/2011 2:15 PM, Jacob Keller wrote: I like your IMGATM proposal, but wouldn't it also potentially break some of the programs? That depends on the program. Programs I write that read PDB files silently ignore keywords that they don't recognize. A model with IMGATM (or whatever keyword you standardize on) records would be interpreted as those those dummy atoms don't exist. If a program died because of them, or if the PDB consumer wanted to "see" the dummy atoms the keywords could be replaced with ATOM using a text editor and a global substitute, and the user would be aware that there is something different about those atoms. I would hope programs would be modified to do sensible things with the dummy atoms since they would have a clear indication that the atoms are indeed dummy. For a graphics program, maybe the bonds involving dummy atoms could be drawn a half brightness. They would be visible but clearly more ghost-like than the majority of atoms in the model. A refinement program could strip them out, perform the refinement, and rebuild them at the end, if needed, using WASNIAHC. I expect they would also be ignored completely in MR and homology modeling/comparison programs. In fact, pretty much any use I would make of the PDB file would involve discarding all the dummy atoms, but with this scheme I could at least know for sure which atoms are fantasy and which were build based on density. Also--and this is a problem with deleting only sidechain atoms in general--it seems that many, myself included, might totally miss that an apparent "alanine" is really a trunco-lysine. What I like is that it does get around the problem of people over-interpreting bogus sidechains, but it falls short, perhaps, in misleading people about what residue is there. I, for one, would not feel that I had to click on all the alanines in a model to verify that they were not lysines, and would be surprised and puzzled for a while about why this ala said lys when I clicked on it. Wouldn't you be surprised? (Well, maybe not after this thread...) I am surprised any time I see all the atoms in a lysine on the surface. "What could possibly be holding that thing in place?" is what jumps to my mind. When I see a side chain on the surface that ends at CB or CG I just assume it is something long and waving in the breeze. I guess it all depends on what you are used to looking at. With dummy atoms that are clearly labeled as such then the graphics programs can be programed as I described above and we both would have the visual cues that we desire. Another advantage of keeping the "dummy flag" separate from the occupancy and B factor fields is that these are then free to be used in the way they were intended. Numerous times I have built side chains that are visible to their end, but a second conformation ends at the CG. I split these side chains into A and B parts with a complete A and a partial B and the group occupancies of A and B sum to 1.0. Now if you tell me that I have to build the entire B side chain and must flag the dummy atoms with occ=0.0 we have a problem. For the dummy atoms the occupancies don't sum to 1.0 any more. Logic tells me that the occupancy of the dummy atoms should be the same as all the real B atoms. This particular case is a good example of why I don't like the idea of building complete side chains in the absence of density. If you are going to build out my B conformation you have to recognize that the reason I don't see density beyond the CG is that there is a B and C conformation for the next CD atom (remember I already have an A conformation for CD elsewhere). To make a logically complete side chain I need to build two dummy conformations for this residue and split my "real" CG, CB, and CA B conformation atoms with no way to decide the relative occupancies of the B and C conformations. That's a lot of complexity for a blurry bit of density. Hell, I have every reason to expect that there is a D conformation in there too - do I have to build that as well? If you expect such a shrub to be built for every surface lysine the IMGATM keyword and the program WASNIAHC would allow it to be generated and represented in an unambiguous and minimally confusing fashion. I wouldn't be happy having to add imaginary atoms to my models, but the representation meets my criteria, and I think it meets yours too. Dale Tronrud JPK On Mon, Apr 4, 2011 at 1:55 AM, Dale Tronrud wrote: The definition of _atom_site.occupancy is The fraction of the atom type present at this site. The sum of the occupancies of all the atom types at this site may not significantly exceed 1.0 unless it is a dummy site. When an atom has an occupancy equal to zero that means that the atom is NEVER present at that site - and that is not what you intend to say. Setting the occupancy to zero does not mean that a full atom is located somewhere in this area. Quite the opposite. (The refe
Re: [ccp4bb] what to do with disordered side chains
Hi Jacob, The PDB header has a record for missing atoms. Coot has an option to find them and any decent validation software will warn about incomplete residues. There are PDBREPORT entries for every PDB file with a list of incomplete residues. If a user makes a very small effort, he doesn't have to go around clicking every 'alanine'. Cheers, Robbie > Date: Mon, 4 Apr 2011 16:15:58 -0500 > From: j-kell...@fsm.northwestern.edu > Subject: Re: [ccp4bb] what to do with disordered side chains > To: CCP4BB@JISCMAIL.AC.UK > > I like your IMGATM proposal, but wouldn't it also potentially break > some of the programs? Also--and this is a problem with deleting only > sidechain atoms in general--it seems that many, myself included, might > totally miss that an apparent "alanine" is really a trunco-lysine. > What I like is that it does get around the problem of people > over-interpreting bogus sidechains, but it falls short, perhaps, in > misleading people about what residue is there. I, for one, would not > feel that I had to click on all the alanines in a model to verify that > they were not lysines, and would be surprised and puzzled for a while > about why this ala said lys when I clicked on it. Wouldn't you be > surprised? (Well, maybe not after this thread...) > > JPK > > > > On Mon, Apr 4, 2011 at 1:55 AM, Dale Tronrud > wrote: > > The definition of _atom_site.occupancy is > > > > The fraction of the atom type present at this site. > > The sum of the occupancies of all the atom types at this site > > may not significantly exceed 1.0 unless it is a dummy site. > > > > When an atom has an occupancy equal to zero that means that the > > atom is NEVER present at that site - and that is not what you > > intend to say. Setting the occupancy to zero does not mean that > > a full atom is located somewhere in this area. Quite the opposite. > > > > (The reference to a dummy site is interesting and implies to > > me that mmCIF already has the mechanism you wish for.) > > > > Having some experience with refining low occupancy atoms and > > working with dummy marker atoms I'm quite confident that you can > > never define a B factor cutoff that would work. No matter what > > value you choose you will find some atoms in density that refine > > to values greater than the cutoff, or the limit you choose is so > > high that you will find marker atoms that refine to less than the > > limit. A B factor cutoff cannot work - no matter the value you > > choose you will always be plagued with false positives or false > > negatives. > > > > If you really want to stuff this bit into one of these fields > > you have to go all out. Set the occupancy of a marker atom to -99.99. > > This will unambiguously mark the atom as an imaginary one. This > > will, of course, break every program that reads PDB format files, > > but that is what should happen in any case. If you change the > > definition of the columns in the file you must mandate that all > > programs be upgraded to recognized the new definitions. I don't > > know how you can do that other than ensuring that the change will > > cause programs to cough. To try to slide it by with a magic value > > that will be silently accepted by existing programs is to beg for > > bugs and subtle side-effects. > > > > Good luck getting the maintainers of the mmCIF standard to accept > > a magic value in either of these fields. > > > > How about this: We already have the keywords ATOM and HETATM > > (and don't ask me why we have two). How about we create a new > > record in the PDB format, say IMGATM, that would have all the > > fields of an ATOM record but would be recognized as whatever the > > marker is for "dummy" atoms in the current mmCIF? Existing programs > > would completely ignore these atoms, as they should until they are > > modified to do something reasonable with them. Those of us who > > have no use for them can either use a switch in the program to > > ignore them or just grep them out of the file. Someone could write > > a program that would take a model with only ATOM and HETATM records > > and fill out all the desired IMGATM records (Let's call that program > > WASNIAHC, everyone would remember that!). > > > > This solution is unambiguous. It can be represented in current > > mmCIF, I think. The PDB could run WASNIAHC themselves after deposition > > but before acceptance by the depositor so people like me would not > > have to deal with them during refinement but would be a
Re: [ccp4bb] what to do with disordered side chains
It's nice to see that this discussion pops up every two years or so with exactly the same arguments :) My vote (as always) is for leaving the atoms of disordered side chains in with high B values, the B values are part of the models. Its up to the popular Biologist's visualization software out there to properly display these models. I'm sure we can use all kinds of nice 4D blurry renderings of these disordered atoms nowadays. Flip On 4/4/2011 23:15, Jacob Keller wrote: I like your IMGATM proposal, but wouldn't it also potentially break some of the programs? Also--and this is a problem with deleting only sidechain atoms in general--it seems that many, myself included, might totally miss that an apparent "alanine" is really a trunco-lysine. What I like is that it does get around the problem of people over-interpreting bogus sidechains, but it falls short, perhaps, in misleading people about what residue is there. I, for one, would not feel that I had to click on all the alanines in a model to verify that they were not lysines, and would be surprised and puzzled for a while about why this ala said lys when I clicked on it. Wouldn't you be surprised? (Well, maybe not after this thread...) JPK On Mon, Apr 4, 2011 at 1:55 AM, Dale Tronrud wrote: The definition of _atom_site.occupancy is The fraction of the atom type present at this site. The sum of the occupancies of all the atom types at this site may not significantly exceed 1.0 unless it is a dummy site. When an atom has an occupancy equal to zero that means that the atom is NEVER present at that site - and that is not what you intend to say. Setting the occupancy to zero does not mean that a full atom is located somewhere in this area. Quite the opposite. (The reference to a dummy site is interesting and implies to me that mmCIF already has the mechanism you wish for.) Having some experience with refining low occupancy atoms and working with dummy marker atoms I'm quite confident that you can never define a B factor cutoff that would work. No matter what value you choose you will find some atoms in density that refine to values greater than the cutoff, or the limit you choose is so high that you will find marker atoms that refine to less than the limit. A B factor cutoff cannot work - no matter the value you choose you will always be plagued with false positives or false negatives. If you really want to stuff this bit into one of these fields you have to go all out. Set the occupancy of a marker atom to -99.99. This will unambiguously mark the atom as an imaginary one. This will, of course, break every program that reads PDB format files, but that is what should happen in any case. If you change the definition of the columns in the file you must mandate that all programs be upgraded to recognized the new definitions. I don't know how you can do that other than ensuring that the change will cause programs to cough. To try to slide it by with a magic value that will be silently accepted by existing programs is to beg for bugs and subtle side-effects. Good luck getting the maintainers of the mmCIF standard to accept a magic value in either of these fields. How about this: We already have the keywords ATOM and HETATM (and don't ask me why we have two). How about we create a new record in the PDB format, say IMGATM, that would have all the fields of an ATOM record but would be recognized as whatever the marker is for "dummy" atoms in the current mmCIF? Existing programs would completely ignore these atoms, as they should until they are modified to do something reasonable with them. Those of us who have no use for them can either use a switch in the program to ignore them or just grep them out of the file. Someone could write a program that would take a model with only ATOM and HETATM records and fill out all the desired IMGATM records (Let's call that program WASNIAHC, everyone would remember that!). This solution is unambiguous. It can be represented in current mmCIF, I think. The PDB could run WASNIAHC themselves after deposition but before acceptance by the depositor so people like me would not have to deal with them during refinement but would be able to see them before our precious works of art are unleashed on the world. Seems like a win-win solution to me. Dale Tronrud On 4/3/2011 9:17 PM, Jacob Keller wrote: Well, what about getting the default settings on the major molecular viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")? While the b cutoff is still be tricky, I assume we could eventually come to consensus on some reasonable cutoff (2 sigma from the mean?), and then this approach would allow each free-spirited crystallographer to keep his own preferred method of dealing with these troublesome sidechains and nary a novice would be led astray JPK On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennettwrote: Most non-structural users are familiar with the
Re: [ccp4bb] what to do with disordered side chains
I like your IMGATM proposal, but wouldn't it also potentially break some of the programs? Also--and this is a problem with deleting only sidechain atoms in general--it seems that many, myself included, might totally miss that an apparent "alanine" is really a trunco-lysine. What I like is that it does get around the problem of people over-interpreting bogus sidechains, but it falls short, perhaps, in misleading people about what residue is there. I, for one, would not feel that I had to click on all the alanines in a model to verify that they were not lysines, and would be surprised and puzzled for a while about why this ala said lys when I clicked on it. Wouldn't you be surprised? (Well, maybe not after this thread...) JPK On Mon, Apr 4, 2011 at 1:55 AM, Dale Tronrud wrote: > The definition of _atom_site.occupancy is > > The fraction of the atom type present at this site. > The sum of the occupancies of all the atom types at this site > may not significantly exceed 1.0 unless it is a dummy site. > > When an atom has an occupancy equal to zero that means that the > atom is NEVER present at that site - and that is not what you > intend to say. Setting the occupancy to zero does not mean that > a full atom is located somewhere in this area. Quite the opposite. > > (The reference to a dummy site is interesting and implies to > me that mmCIF already has the mechanism you wish for.) > > Having some experience with refining low occupancy atoms and > working with dummy marker atoms I'm quite confident that you can > never define a B factor cutoff that would work. No matter what > value you choose you will find some atoms in density that refine > to values greater than the cutoff, or the limit you choose is so > high that you will find marker atoms that refine to less than the > limit. A B factor cutoff cannot work - no matter the value you > choose you will always be plagued with false positives or false > negatives. > > If you really want to stuff this bit into one of these fields > you have to go all out. Set the occupancy of a marker atom to -99.99. > This will unambiguously mark the atom as an imaginary one. This > will, of course, break every program that reads PDB format files, > but that is what should happen in any case. If you change the > definition of the columns in the file you must mandate that all > programs be upgraded to recognized the new definitions. I don't > know how you can do that other than ensuring that the change will > cause programs to cough. To try to slide it by with a magic value > that will be silently accepted by existing programs is to beg for > bugs and subtle side-effects. > > Good luck getting the maintainers of the mmCIF standard to accept > a magic value in either of these fields. > > How about this: We already have the keywords ATOM and HETATM > (and don't ask me why we have two). How about we create a new > record in the PDB format, say IMGATM, that would have all the > fields of an ATOM record but would be recognized as whatever the > marker is for "dummy" atoms in the current mmCIF? Existing programs > would completely ignore these atoms, as they should until they are > modified to do something reasonable with them. Those of us who > have no use for them can either use a switch in the program to > ignore them or just grep them out of the file. Someone could write > a program that would take a model with only ATOM and HETATM records > and fill out all the desired IMGATM records (Let's call that program > WASNIAHC, everyone would remember that!). > > This solution is unambiguous. It can be represented in current > mmCIF, I think. The PDB could run WASNIAHC themselves after deposition > but before acceptance by the depositor so people like me would not > have to deal with them during refinement but would be able to see > them before our precious works of art are unleashed on the world. > > Seems like a win-win solution to me. > > Dale Tronrud > > > On 4/3/2011 9:17 PM, Jacob Keller wrote: >> >> Well, what about getting the default settings on the major molecular >> viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")? >> While the b cutoff is still be tricky, I assume we could eventually >> come to consensus on some reasonable cutoff (2 sigma from the mean?), >> and then this approach would allow each free-spirited crystallographer >> to keep his own preferred method of dealing with these troublesome >> sidechains and nary a novice would be led astray >> >> JPK >> >> On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett wrote: >>> >>> Most non-structural users are familiar with the sequence of the proteins >>> they are studying, and most software does at least display residue identity >>> if you select an atom in a residue, so usually it is not necessary to do any >>> cross checking besides selecting an atom in the residue and seeing what its >>> residue name is. The chance of somebody misinterpreting a truncated Lys as >>>
Re: [ccp4bb] what to do with disordered side chains
On Mon, 2011-04-04 at 09:38 -0500, Jacob Keller wrote: > Could it be that they are not normal because of all of the outlier, > huge-b-factor sidechains? That is part of it > If every exposed sidechain without real > density gets a b-factor of 150, wouldn't that make a sizeable and > illegitimate non-normal population? It will surely skew the standard deviation to higher values > I would actually be curious about > normality of b-factors--is there such a study/figure out there > somewhere, with, say, histograms of b-factors of many individual > structures? Well, technically speaking they cannot possibly obey normal distribution since they are always positive. Another reason is, of course, that there are different atom types (e.g. side chains versus backbone) and you have at best multi-modal distribution. In my experience, the B-factor distributions always fail normality tests even when you break atoms to groups (the most "normal" are, somewhat expectedly, waters, yet they fail too). Cheers, Ed. PS. If you keep going at this - start a new thread -- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs
Re: [ccp4bb] what to do with disordered side chains
Hi Robbie, I updated my stripper program to remove all atoms with occ<0.00 instead of > 0.00 > - I used to do it in the past in phenix.model_vs_data but then I found it too boring since it was silently swallowing the problem cases -:) so I reverted it back to the "naive mode" when it takes what's in PDB as "God given" and computes the stats using it. This in turn points out problem cases, which is instructive, and which one can take care of on a second walk-through. - Also, you will be missing things like this: > http://www.rcsb.org/pdb/files/3otj.pdb http://www.rcsb.org/pdb/files/3kcj.pdb (.. and so on, I have a full list) where, I guess, "D" with negative occupancy actually means H -:) All the best! Pavel.
Re: [ccp4bb] what to do with disordered side chains
Nice one Pavel. PDB_REDO actually runs on these files but it's not pretty. I updated my stripper program to remove all atoms with occ<0.00 instead of 0.00 Cheers, Robbie Date: Mon, 4 Apr 2011 07:26:23 -0700 From: pafon...@gmail.com Subject: Re: [ccp4bb] what to do with disordered side chains To: CCP4BB@JISCMAIL.AC.UK Hi Dale, Set the occupancy of a marker atom to -99.99. This will unambiguously mark the atom as an imaginary one. This will, of course, break every program that reads PDB format files, may be not every -:) phenix.model_vs_data works just fine with http://www.rcsb.org/pdb/files/1BQU.pdb http://www.rcsb.org/pdb/files/1azr.pdb (Um... I guess I just created some work for PDB_REDO folks, sorry -:) ) All the best! Pavel.
Re: [ccp4bb] what to do with disordered side chains
Dear James, You make a very good point. So far we only discussed the option of removing alls side chain atoms except for CB. What if only a few side chain atoms are outside the density? Should we just remove those? If we use the argument that we should remove the atoms we cannot see, then surely we should keep the ones we can see. A problem is that if someone else recalculates the maps and inspects them (which is the point of the EDS), some atoms may very well fall inside or outside the density differently than in the original crystallographic study: software changes, other reflections are included (remember the recent I/sigI discussion), and last (but not least) when atoms are removed the solvent mask changes. Anyway, I don't think that the side chain discussion will be solved in this thread. PDB users are not all the same and treat the options that are proposed differently. They are all used in the PDB and that complicates matters for the users. In PDB_REDO (plug, plug ;) we build all missing side chains (and rebuild the zero-occupancy ones) and let the B-factor sort it out. Not everyone will agree with this, but at least it is consistent. If anyone studies a single structure properly, he should use the density and there is no problem. For statistics studies the first thing people do is filter by resolution and B-factor (and sequence identity) so the really bad side chains are removed from the testset anyway. Cheers, Robbie > Date: Sun, 3 Apr 2011 23:45:10 -0700 > From: jmhol...@lbl.gov > Subject: Re: [ccp4bb] what to do with disordered side chains > To: CCP4BB@JISCMAIL.AC.UK > > At the risk of throwing a little gasoline on the flame war, what about > side chains that will ALWAYS poke outside of the electron density? For > example, pretty much any terminal aliphatic at 3.5 A resolution? I > first learned this about 15 years ago when I made this movie: > > http://bl831.als.lbl.gov/~jamesh/movies/resolution.mpeg > > For those of you whose browser no longer supports MPEG-1, this is a > movie of a calculated (aka noise-free) electron density map, contoured > at "1 sigma", but cut to the resolution shown after applying an overall > B factor sufficient to suppress series-termination. By that I mean the > maps don't look all that different with or without the cutoff. The > coordinates shown are the "correct" model used to calculate the map. At > about 3.5 A you start to see side chains poking out of the density, and > at 6 A, all the side chains are "gone". Does this mean they should be > modeled with zero occupancy? ;) > > -James Holton > MAD Scientist > > > On 4/3/2011 9:57 PM, Maia Cherney wrote: > > I guess, most hydrophilic side chains on the surface are flexible, > > they don't keep the same conformation. If you cut those side chains > > off, the surface will be looking pretty hydrophobic and misleading > > (and very horrible). I prefer to see them intact. I know, most of them > > are flexible and don't have one exact position, but it's OK. I know > > they are there not far from the main chain. Usually, their exact > > position is irrelevant. > > > > Maia > > > > > > > > Jacob Keller wrote: > >> Well, what about getting the default settings on the major molecular > >> viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")? > >> While the b cutoff is still be tricky, I assume we could eventually > >> come to consensus on some reasonable cutoff (2 sigma from the mean?), > >> and then this approach would allow each free-spirited crystallographer > >> to keep his own preferred method of dealing with these troublesome > >> sidechains and nary a novice would be led astray > >> > >> JPK > >> > >> On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett wrote: > >>> Most non-structural users are familiar with the sequence of the > >>> proteins they are studying, and most software does at least display > >>> residue identity if you select an atom in a residue, so usually it > >>> is not necessary to do any cross checking besides selecting an atom > >>> in the residue and seeing what its residue name is. The chance of > >>> somebody misinterpreting a truncated Lys as Ala is, in my > >>> experience, much much lower than the chance they will trust the xyz > >>> coordinates of atoms with zero occupancy or high B factors. > >>> > >>> What worries me the most is somebody designing a whole biological > >>> experiment around an over-interpretation of details that are implied > >>> by xyz c
Re: [ccp4bb] what to do with disordered side chains
> Not likely - the distribution of ADPs is not normal, so you can't easily > convert Z-scores to probabilities. Could it be that they are not normal because of all of the outlier, huge-b-factor sidechains? If every exposed sidechain without real density gets a b-factor of 150, wouldn't that make a sizeable and illegitimate non-normal population? I would actually be curious about normality of b-factors--is there such a study/figure out there somewhere, with, say, histograms of b-factors of many individual structures? -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program cel: 773.608.9185 email: j-kell...@northwestern.edu ***
Re: [ccp4bb] what to do with disordered side chains
Hi Dale, Set the occupancy of a marker atom to -99.99. > This will unambiguously mark the atom as an imaginary one. This > will, of course, break every program that reads PDB format files, may be not every -:) phenix.model_vs_data works just fine with http://www.rcsb.org/pdb/files/1BQU.pdb http://www.rcsb.org/pdb/files/1azr.pdb (Um... I guess I just created some work for PDB_REDO folks, sorry -:) ) All the best! Pavel.
Re: [ccp4bb] what to do with disordered side chains
On Sun, 2011-04-03 at 23:17 -0500, Jacob Keller wrote: > While the b cutoff is still be tricky, I assume we could eventually > come to consensus on some reasonable cutoff (2 sigma from the mean?) Not likely - the distribution of ADPs is not normal, so you can't easily convert Z-scores to probabilities. -- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs
Re: [ccp4bb] what to do with disordered side chains
The definition of _atom_site.occupancy is The fraction of the atom type present at this site. The sum of the occupancies of all the atom types at this site may not significantly exceed 1.0 unless it is a dummy site. When an atom has an occupancy equal to zero that means that the atom is NEVER present at that site - and that is not what you intend to say. Setting the occupancy to zero does not mean that a full atom is located somewhere in this area. Quite the opposite. (The reference to a dummy site is interesting and implies to me that mmCIF already has the mechanism you wish for.) Having some experience with refining low occupancy atoms and working with dummy marker atoms I'm quite confident that you can never define a B factor cutoff that would work. No matter what value you choose you will find some atoms in density that refine to values greater than the cutoff, or the limit you choose is so high that you will find marker atoms that refine to less than the limit. A B factor cutoff cannot work - no matter the value you choose you will always be plagued with false positives or false negatives. If you really want to stuff this bit into one of these fields you have to go all out. Set the occupancy of a marker atom to -99.99. This will unambiguously mark the atom as an imaginary one. This will, of course, break every program that reads PDB format files, but that is what should happen in any case. If you change the definition of the columns in the file you must mandate that all programs be upgraded to recognized the new definitions. I don't know how you can do that other than ensuring that the change will cause programs to cough. To try to slide it by with a magic value that will be silently accepted by existing programs is to beg for bugs and subtle side-effects. Good luck getting the maintainers of the mmCIF standard to accept a magic value in either of these fields. How about this: We already have the keywords ATOM and HETATM (and don't ask me why we have two). How about we create a new record in the PDB format, say IMGATM, that would have all the fields of an ATOM record but would be recognized as whatever the marker is for "dummy" atoms in the current mmCIF? Existing programs would completely ignore these atoms, as they should until they are modified to do something reasonable with them. Those of us who have no use for them can either use a switch in the program to ignore them or just grep them out of the file. Someone could write a program that would take a model with only ATOM and HETATM records and fill out all the desired IMGATM records (Let's call that program WASNIAHC, everyone would remember that!). This solution is unambiguous. It can be represented in current mmCIF, I think. The PDB could run WASNIAHC themselves after deposition but before acceptance by the depositor so people like me would not have to deal with them during refinement but would be able to see them before our precious works of art are unleashed on the world. Seems like a win-win solution to me. Dale Tronrud On 4/3/2011 9:17 PM, Jacob Keller wrote: Well, what about getting the default settings on the major molecular viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")? While the b cutoff is still be tricky, I assume we could eventually come to consensus on some reasonable cutoff (2 sigma from the mean?), and then this approach would allow each free-spirited crystallographer to keep his own preferred method of dealing with these troublesome sidechains and nary a novice would be led astray JPK On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett wrote: Most non-structural users are familiar with the sequence of the proteins they are studying, and most software does at least display residue identity if you select an atom in a residue, so usually it is not necessary to do any cross checking besides selecting an atom in the residue and seeing what its residue name is. The chance of somebody misinterpreting a truncated Lys as Ala is, in my experience, much much lower than the chance they will trust the xyz coordinates of atoms with zero occupancy or high B factors. What worries me the most is somebody designing a whole biological experiment around an over-interpretation of details that are implied by xyz coordinates of atoms, even if those atoms were not resolved in the maps. When this sort of error occurs it is a level of pain and wasted effort that makes the "pain" associated with having to build back in missing side chains look completely trivial. As long as the PDB file format is the way users get structural data, there is really no good way to communicate "atom exists with no reliable coordinates" to the user, given the diversity of software packages out there for reading PDB files and the historical lack of any standard way of dealing with this issue. Even if the file format is hacked there is no way to force all the existing software out
Re: [ccp4bb] what to do with disordered side chains
At the risk of throwing a little gasoline on the flame war, what about side chains that will ALWAYS poke outside of the electron density? For example, pretty much any terminal aliphatic at 3.5 A resolution? I first learned this about 15 years ago when I made this movie: http://bl831.als.lbl.gov/~jamesh/movies/resolution.mpeg For those of you whose browser no longer supports MPEG-1, this is a movie of a calculated (aka noise-free) electron density map, contoured at "1 sigma", but cut to the resolution shown after applying an overall B factor sufficient to suppress series-termination. By that I mean the maps don't look all that different with or without the cutoff. The coordinates shown are the "correct" model used to calculate the map. At about 3.5 A you start to see side chains poking out of the density, and at 6 A, all the side chains are "gone". Does this mean they should be modeled with zero occupancy? ;) -James Holton MAD Scientist On 4/3/2011 9:57 PM, Maia Cherney wrote: I guess, most hydrophilic side chains on the surface are flexible, they don't keep the same conformation. If you cut those side chains off, the surface will be looking pretty hydrophobic and misleading (and very horrible). I prefer to see them intact. I know, most of them are flexible and don't have one exact position, but it's OK. I know they are there not far from the main chain. Usually, their exact position is irrelevant. Maia Jacob Keller wrote: Well, what about getting the default settings on the major molecular viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")? While the b cutoff is still be tricky, I assume we could eventually come to consensus on some reasonable cutoff (2 sigma from the mean?), and then this approach would allow each free-spirited crystallographer to keep his own preferred method of dealing with these troublesome sidechains and nary a novice would be led astray JPK On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett wrote: Most non-structural users are familiar with the sequence of the proteins they are studying, and most software does at least display residue identity if you select an atom in a residue, so usually it is not necessary to do any cross checking besides selecting an atom in the residue and seeing what its residue name is. The chance of somebody misinterpreting a truncated Lys as Ala is, in my experience, much much lower than the chance they will trust the xyz coordinates of atoms with zero occupancy or high B factors. What worries me the most is somebody designing a whole biological experiment around an over-interpretation of details that are implied by xyz coordinates of atoms, even if those atoms were not resolved in the maps. When this sort of error occurs it is a level of pain and wasted effort that makes the "pain" associated with having to build back in missing side chains look completely trivial. As long as the PDB file format is the way users get structural data, there is really no good way to communicate "atom exists with no reliable coordinates" to the user, given the diversity of software packages out there for reading PDB files and the historical lack of any standard way of dealing with this issue. Even if the file format is hacked there is no way to force all the existing software out there to understand the hack. A file format that isn't designed with this sort of feature from day one is not going to be fixable as a practical matter after so much legacy code has accumulated. -Eric On Apr 3, 2011, at 2:20 PM, Jacob Keller wrote: To the delete-the-atom-nik's: do you propose deleting the whole residue or just the side chain? I can understand deleting the whole residue, but deleting only the side chain seems to me to be placing a stumbling block also, and even possibly confusing for an experienced crystallographer: the .pdb says "lys" but it looks like an ala? Which is it? I could imagine a lot of frustration-hours arising from this practice, with people cross-checking sequences, looking in the methods sections for mutations... JPK
Re: [ccp4bb] what to do with disordered side chains
I guess, most hydrophilic side chains on the surface are flexible, they don't keep the same conformation. If you cut those side chains off, the surface will be looking pretty hydrophobic and misleading (and very horrible). I prefer to see them intact. I know, most of them are flexible and don't have one exact position, but it's OK. I know they are there not far from the main chain. Usually, their exact position is irrelevant. Maia Jacob Keller wrote: Well, what about getting the default settings on the major molecular viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")? While the b cutoff is still be tricky, I assume we could eventually come to consensus on some reasonable cutoff (2 sigma from the mean?), and then this approach would allow each free-spirited crystallographer to keep his own preferred method of dealing with these troublesome sidechains and nary a novice would be led astray JPK On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett wrote: Most non-structural users are familiar with the sequence of the proteins they are studying, and most software does at least display residue identity if you select an atom in a residue, so usually it is not necessary to do any cross checking besides selecting an atom in the residue and seeing what its residue name is. The chance of somebody misinterpreting a truncated Lys as Ala is, in my experience, much much lower than the chance they will trust the xyz coordinates of atoms with zero occupancy or high B factors. What worries me the most is somebody designing a whole biological experiment around an over-interpretation of details that are implied by xyz coordinates of atoms, even if those atoms were not resolved in the maps. When this sort of error occurs it is a level of pain and wasted effort that makes the "pain" associated with having to build back in missing side chains look completely trivial. As long as the PDB file format is the way users get structural data, there is really no good way to communicate "atom exists with no reliable coordinates" to the user, given the diversity of software packages out there for reading PDB files and the historical lack of any standard way of dealing with this issue. Even if the file format is hacked there is no way to force all the existing software out there to understand the hack. A file format that isn't designed with this sort of feature from day one is not going to be fixable as a practical matter after so much legacy code has accumulated. -Eric On Apr 3, 2011, at 2:20 PM, Jacob Keller wrote: To the delete-the-atom-nik's: do you propose deleting the whole residue or just the side chain? I can understand deleting the whole residue, but deleting only the side chain seems to me to be placing a stumbling block also, and even possibly confusing for an experienced crystallographer: the .pdb says "lys" but it looks like an ala? Which is it? I could imagine a lot of frustration-hours arising from this practice, with people cross-checking sequences, looking in the methods sections for mutations... JPK
Re: [ccp4bb] what to do with disordered side chains
Well, what about getting the default settings on the major molecular viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")? While the b cutoff is still be tricky, I assume we could eventually come to consensus on some reasonable cutoff (2 sigma from the mean?), and then this approach would allow each free-spirited crystallographer to keep his own preferred method of dealing with these troublesome sidechains and nary a novice would be led astray JPK On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett wrote: > Most non-structural users are familiar with the sequence of the proteins they > are studying, and most software does at least display residue identity if you > select an atom in a residue, so usually it is not necessary to do any cross > checking besides selecting an atom in the residue and seeing what its residue > name is. The chance of somebody misinterpreting a truncated Lys as Ala is, > in my experience, much much lower than the chance they will trust the xyz > coordinates of atoms with zero occupancy or high B factors. > > What worries me the most is somebody designing a whole biological experiment > around an over-interpretation of details that are implied by xyz coordinates > of atoms, even if those atoms were not resolved in the maps. When this sort > of error occurs it is a level of pain and wasted effort that makes the "pain" > associated with having to build back in missing side chains look completely > trivial. > > As long as the PDB file format is the way users get structural data, there is > really no good way to communicate "atom exists with no reliable coordinates" > to the user, given the diversity of software packages out there for reading > PDB files and the historical lack of any standard way of dealing with this > issue. Even if the file format is hacked there is no way to force all the > existing software out there to understand the hack. A file format that isn't > designed with this sort of feature from day one is not going to be fixable as > a practical matter after so much legacy code has accumulated. > > -Eric > > > > On Apr 3, 2011, at 2:20 PM, Jacob Keller wrote: > >> To the delete-the-atom-nik's: do you propose deleting the whole >> residue or just the side chain? I can understand deleting the whole >> residue, but deleting only the side chain seems to me to be placing a >> stumbling block also, and even possibly confusing for an experienced >> crystallographer: the .pdb says "lys" but it looks like an ala? Which >> is it? I could imagine a lot of frustration-hours arising from this >> practice, with people cross-checking sequences, looking in the methods >> sections for mutations... >> >> JPK >> > -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program cel: 773.608.9185 email: j-kell...@northwestern.edu ***
Re: [ccp4bb] what to do with disordered side chains
Most non-structural users are familiar with the sequence of the proteins they are studying, and most software does at least display residue identity if you select an atom in a residue, so usually it is not necessary to do any cross checking besides selecting an atom in the residue and seeing what its residue name is. The chance of somebody misinterpreting a truncated Lys as Ala is, in my experience, much much lower than the chance they will trust the xyz coordinates of atoms with zero occupancy or high B factors. What worries me the most is somebody designing a whole biological experiment around an over-interpretation of details that are implied by xyz coordinates of atoms, even if those atoms were not resolved in the maps. When this sort of error occurs it is a level of pain and wasted effort that makes the "pain" associated with having to build back in missing side chains look completely trivial. As long as the PDB file format is the way users get structural data, there is really no good way to communicate "atom exists with no reliable coordinates" to the user, given the diversity of software packages out there for reading PDB files and the historical lack of any standard way of dealing with this issue. Even if the file format is hacked there is no way to force all the existing software out there to understand the hack. A file format that isn't designed with this sort of feature from day one is not going to be fixable as a practical matter after so much legacy code has accumulated. -Eric On Apr 3, 2011, at 2:20 PM, Jacob Keller wrote: > To the delete-the-atom-nik's: do you propose deleting the whole > residue or just the side chain? I can understand deleting the whole > residue, but deleting only the side chain seems to me to be placing a > stumbling block also, and even possibly confusing for an experienced > crystallographer: the .pdb says "lys" but it looks like an ala? Which > is it? I could imagine a lot of frustration-hours arising from this > practice, with people cross-checking sequences, looking in the methods > sections for mutations... > > JPK >
Re: [ccp4bb] what to do with disordered side chains
I vote for the electron density irrespective of side chains, main chains, ligands, dark matter. The PDB is a collection of experimentally determined structures per its own definition. If density supports it high B is fine - B-factor simply is a parameter of a probability distribution. If you extend that to no density - it becomes problematic. After all, coordinates imply that an atom is actually at some specified place with a certain probability. We may know that the atom necessarily has to be someplace lest it got chewed off for some reason. The experiment just tells you that you do not know where the atom is. Or in Rumsfeldic: Better a known unknown than a unknown known. Cheers, BR -Original Message- From: Boaz Shaanan [mailto:bshaa...@exchange.bgu.ac.il] Sent: Sunday, April 03, 2011 11:02 AM To: hofkristall...@gmail.com; CCP4BB@JISCMAIL.AC.UK Subject: RE: [ccp4bb] what to do with disordered side chains The original posting that started this thread referred to side-chains, as the subject still suggests. Do you propose to omit only side-chain atoms, in which case you end up with different residues, as pointed out by quite a few people,or do you suggest also to omit the main-chain atoms of the problematic residues ? Besides, as mentioned by Phoebe and others, many users (non-crystallographers) of PDB's know already the meaning of the B-factor and will know how to interpret a very high B. It is our task (the crystallographers) to enllighten those who don't know what the B column in a PDB entry stands for. I certainly do and I'm sure many of us do so too. I voted for high B and would vote for it again, if asked. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] On Behalf Of Bernhard Rupp (Hofkristallrat a.D.) [hofkristall...@gmail.com] Sent: Sunday, April 03, 2011 7:42 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] what to do with disordered side chains Thus my feeling is that if one does NOT see the coords in the electron density, they should NOT be included, and let someone else try to model them in, but they should be aware that they are modeling them. Joel L. Sussman Concur. BMC p 680 'How to handle missing parts' Best wishes, BR On 3 Apr 2011, at 06:15, Frances C. Bernstein wrote: Doing something sensible in the major software packages, both for graphics and for other analysis of the structure, could solve the problem for most users. But nobody knows what other software is out there being used by individuals or small groups. And the more remote the authors of that software are from protein structure solution the more likely it is that they have not/will not properly handle atoms with zero occupancy or high B values, for example. I am absolutely positive that there is software that does its voodoo on ATOM/HETATM records and pays absolutely no attention to anything beyond the x, y, z coordinates (i.e. beyond column 54). Frances Bernstein = Bernstein + Sons * * Information Systems Consultants 5 Brewster Lane, Bellport, NY 11713-2803 * * *** *Frances C. Bernstein * *** f...@bernstein-plus-sons.com<mailto:f...@bernstein-plus-sons.com> *** * * *** 1-631-286-1339FAX: 1-631-286-1999 = On Sat, 2 Apr 2011, Jacob Keller wrote: I guess I missed it in the flurry of replies to this thread over the last few days, but what exactly is so terrible about keeping the atoms (since you have chemical evidence from protein sequence that they are there, and even if there is X-ray damage they were originally there and are likely still there in a subset of the molecules), but changing occupancy to zero as an acknowledgment that your data does not provide evidence to support a specific atomic position for these atoms? Some users might pull up the structure, see those atoms, and think their positions were based on data, which they were not, and then draw conclusions based on them. I agree that occ=0 is tantamount to the suggestion you queried, however. A somewhat key question might be: across the various molecular visualization programs, what is the default way to handle atoms with occ=0? Perhaps those programs might be the best place to fix the problem... JPK *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program cel: 773.608.9185 email: j-kell...@northwestern.edu<mailto:j-kell...@northwestern.edu> ***
Re: [ccp4bb] what to do with disordered side chains
Clearly there are strong feelings held by the advocates of the several solutions to the problem of what to do about atoms that cannot be reliably placed based on the electron density map. I certainly understand since I passionately support my own favorite solution. Why is it that a community of generally reasonable people keep coming back to this same issue and yet fail to find a solution that can reach some kind of consensus? My 2 cents on this, more fundamental, issue: A model created by someone who believes that all atoms (for a residue with any atoms) must be built will contain two kinds of atoms. Those placed based on the appearance of the electron density and those placed in some convenient location simply to fill out the atom count. I think most everyone agrees that a full residue is a convenience for some users of our models. What those of us who favor partial models want is an absolutely clear distinction between the two classes of atoms. All this needs is a bit. Literally, one bit of data that flags those atoms added to the model simply to complete the set. Why can't we come to a solution that satisfies? Because we continue to use a non-extensible file format that does not allow us a place to put this bit. Some people want to put the bit in the occupancy column by defining a special value (occ=0) that would be the flag. Some people want to put it in the B factor column by defining a special value there (a couple possibilities here, B=1000.00, B=500.00, B varying but larger than that of any atom built into density). The B factor and occupancy columns in the PDB file have been precisely defined back when the mmCIF dictionary was created and to change their definitions now would require opening that process again. I am pretty sure that committee in charge will never allow a definition for these items that includes the phrase "... except when the value is equal too ...". You can't run a database that way. Each piece of information has to have its own tag and definition. Once it is defined we can embrace the task of educating software developers and our collaborators who use our models in its meaning. There is just no place to put this bit in a PDB format file. mmCIF - its trivial. PDB format - no way. As long as we insist that this format is the preferred means of distributing our models we will continue to return to this argument again and again with no possibility of coming to a solution. Dale Tronrud P.S. I've even thought about using the model of the "REMARK" statement, where all sorts of information have been added by the hack of "standardized remarks". I thought that one could create a "standardized footnote" that would mark the atoms as "imaginary". I found that, unfortunately, footnotes were removed from the PDB format many years ago. On 4/3/2011 11:01 AM, Boaz Shaanan wrote: The original posting that started this thread referred to side-chains, as the subject still suggests. Do you propose to omit only side-chain atoms, in which case you end up with different residues, as pointed out by quite a few people,or do you suggest also to omit the main-chain atoms of the problematic residues ? Besides, as mentioned by Phoebe and others, many users (non-crystallographers) of PDB's know already the meaning of the B-factor and will know how to interpret a very high B. It is our task (the crystallographers) to enllighten those who don't know what the B column in a PDB entry stands for. I certainly do and I'm sure many of us do so too. I voted for high B and would vote for it again, if asked. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] On Behalf Of Bernhard Rupp (Hofkristallrat a.D.) [hofkristall...@gmail.com] Sent: Sunday, April 03, 2011 7:42 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] what to do with disordered side chains Thus my feeling is that if one does NOT see the coords in the electron density, they should NOT be included, and let someone else try to model them in, but they should be aware that they are modeling them. Joel L. Sussman Concur. BMC p 680 ‘How to handle missing parts’ Best wishes, BR On 3 Apr 2011, at 06:15, Frances C. Bernstein wrote: Doing something sensible in the major software packages, both for graphics and for other analysis of the structure, could solve the problem for most users. But nobody knows what other software is out there being used by individuals or small groups. And the more remote the authors of that software are from protein structure solution the more likely it is that they have not/will not properly handle atoms with ze
Re: [ccp4bb] what to do with disordered side chains
Hi, it is quite possible to truncate say Lys residue isnt it? so why not do this, this doesn't change the identity of the residue but precisely draws attention to the fact that atoms are missing due to lack of density. And if you click on an atom in Pymol, at least i dont see the b-factor displayed anywhere - i would suspect its the same case with other mol. graphics visualization software to fair extent. or its some small print somewhere... + can you actually tell by just looking at the B-factor whether there is any density or not? if the wilson b is high i suspect you can see density and the B-factor will be high where as if Wilson B is low same b-factor will probably mean you dont see density at same sigma cutoff/contour level. or i may be wrong but suspect this is the casewhich is why i think its better probably to truncate (not to ala/gly, but to truncate) them if you don't see the density for the side chain at all. OR model the 5 most like conformers then - or 4 - 6 ? 3? well, this can go on forever --or rather hopefully NOT, but really i don't think this quite so simple as what comes to B-factors and later analysis ---in particular if that will be in anyway automated and will deal with say a larger set of coordinate files. is it really a good idea to leave an active site residue side chain with high B (=no density what so ever) in, in _one_ defined conformation? i am not so dead certain... cheers, Tommi On Apr 3, 2011, at 9:01 PM, Boaz Shaanan wrote: The original posting that started this thread referred to side- chains, as the subject still suggests. Do you propose to omit only side-chain atoms, in which case you end up with different residues, as pointed out by quite a few people,or do you suggest also to omit the main-chain atoms of the problematic residues ? Besides, as mentioned by Phoebe and others, many users (non- crystallographers) of PDB's know already the meaning of the B- factor and will know how to interpret a very high B. It is our task (the crystallographers) to enllighten those who don't know what the B column in a PDB entry stands for. I certainly do and I'm sure many of us do so too. I voted for high B and would vote for it again, if asked. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] On Behalf Of Bernhard Rupp (Hofkristallrat a.D.) [hofkristall...@gmail.com] Sent: Sunday, April 03, 2011 7:42 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] what to do with disordered side chains Thus my feeling is that if one does NOT see the coords in the electron density, they should NOT be included, and let someone else try to model them in, but they should be aware that they are modeling them. Joel L. Sussman Concur. BMC p 680 ‘How to handle missing parts’ Best wishes, BR On 3 Apr 2011, at 06:15, Frances C. Bernstein wrote: Doing something sensible in the major software packages, both for graphics and for other analysis of the structure, could solve the problem for most users. But nobody knows what other software is out there being used by individuals or small groups. And the more remote the authors of that software are from protein structure solution the more likely it is that they have not/will not properly handle atoms with zero occupancy or high B values, for example. I am absolutely positive that there is software that does its voodoo on ATOM/HETATM records and pays absolutely no attention to anything beyond the x, y, z coordinates (i.e. beyond column 54). Frances Bernstein = Bernstein + Sons * * Information Systems Consultants 5 Brewster Lane, Bellport, NY 11713-2803 * * *** *Frances C. Bernstein * *** f...@bernstein-plus-sons.com<mailto:f...@bernstein-plus-sons.com > *** * * *** 1-631-286-1339FAX: 1-631-286-1999 = On Sat, 2 Apr 2011, Jacob Keller wrote: I guess I missed it in the flurry of replies to this thread over the last few days, but what exactly is so terrible about keeping the atoms (since you have chemical evidence from protein sequence that they are there, and even if there is X-ray damage they were originally there and are likely still there in a subset of the molecules), but changing occupancy to zero as an acknowledgment that your data does not provide evidence to support a specific atomic position for these atoms? Some users might pull up the structure, see those atoms, and think their positions were based on data, which they were not, and then draw conclusions based on them. I agree that o
Re: [ccp4bb] what to do with disordered side chains
On Sunday, April 03, 2011, Jacob Keller wrote: > To the delete-the-atom-nik's: do you propose deleting the whole > residue or just the side chain? Omit the atoms beyond CB for which there is no apparent density. Always place CB if the backbone trace is reasonable, because its location is fixed a priori by known stereochemistry. As a practical matter, I use Coot's "stub" command. Ethan > I can understand deleting the whole > residue, but deleting only the side chain seems to me to be placing a > stumbling block also, and even possibly confusing for an experienced > crystallographer: the .pdb says "lys" but it looks like an ala? Which > is it? I could imagine a lot of frustration-hours arising from this > practice, with people cross-checking sequences, looking in the methods > sections for mutations... > > JPK > > On Sun, Apr 3, 2011 at 11:42 AM, Bernhard Rupp (Hofkristallrat a.D.) > wrote: > > Thus my feeling is that if one does NOT see the coords in the electron > > > > density, they should NOT be included, and let someone else try to model > > > > them in, but they should be aware that they are modeling them. > > > > Joel L. Sussman > > > > > > > > Concur. BMC p 680 ‘How to handle missing parts’ > > > > > > > > Best wishes, BR > > > > > > > > On 3 Apr 2011, at 06:15, Frances C. Bernstein wrote: > > > > > > > > Doing something sensible in the major software packages, both > > for graphics and for other analysis of the structure, could > > solve the problem for most users. > > > > But nobody knows what other software is out there being used by > > individuals or small groups. And the more remote the authors > > of that software are from protein structure solution the more > > likely it is that they have not/will not properly handle atoms > > with zero occupancy or high B values, for example. > > > > I am absolutely positive that there is software that does its > > voodoo on ATOM/HETATM records and pays absolutely no attention > > to anything beyond the x, y, z coordinates (i.e. beyond column 54). > > > >Frances Bernstein > > > > = > > Bernstein + Sons > > * * Information Systems Consultants > > 5 Brewster Lane, Bellport, NY 11713-2803 > > * * *** > > *Frances C. Bernstein > > * *** f...@bernstein-plus-sons.com > > *** * > > * *** 1-631-286-1339FAX: 1-631-286-1999 > > = > > > > On Sat, 2 Apr 2011, Jacob Keller wrote: > > > > I guess I missed it in the flurry of replies to this thread over the > > > > last few days, but what exactly is so terrible about keeping the atoms > > > > (since you have chemical evidence from protein sequence that they are > > > > there, and even if there is X-ray damage they were originally there and > > > > are likely still there in a subset of the molecules), but changing > > > > occupancy to zero as an acknowledgment that your data does not provide > > > > evidence to support a specific atomic position for these atoms? > > > > > > > > Some users might pull up the structure, see those atoms, and think > > > > their positions were based on data, which they were not, and then draw > > > > conclusions based on them. I agree that occ=0 is tantamount to the > > > > suggestion you queried, however. > > > > > > > > A somewhat key question might be: across the various molecular > > > > visualization programs, what is the default way to handle atoms with > > > > occ=0? Perhaps those programs might be the best place to fix the > > > > problem... > > > > > > > > JPK > > > > > > > > > > > > *** > > > > Jacob Pearson Keller > > > > Northwestern University > > > > Medical Scientist Training Program > > > > cel: 773.608.9185 > > > > email: j-kell...@northwestern.edu > > > > *** > > > > > > > > > > > >
Re: [ccp4bb] what to do with disordered side chains
To the delete-the-atom-nik's: do you propose deleting the whole residue or just the side chain? I can understand deleting the whole residue, but deleting only the side chain seems to me to be placing a stumbling block also, and even possibly confusing for an experienced crystallographer: the .pdb says "lys" but it looks like an ala? Which is it? I could imagine a lot of frustration-hours arising from this practice, with people cross-checking sequences, looking in the methods sections for mutations... JPK On Sun, Apr 3, 2011 at 11:42 AM, Bernhard Rupp (Hofkristallrat a.D.) wrote: > Thus my feeling is that if one does NOT see the coords in the electron > > density, they should NOT be included, and let someone else try to model > > them in, but they should be aware that they are modeling them. > > Joel L. Sussman > > > > Concur. BMC p 680 ‘How to handle missing parts’ > > > > Best wishes, BR > > > > On 3 Apr 2011, at 06:15, Frances C. Bernstein wrote: > > > > Doing something sensible in the major software packages, both > for graphics and for other analysis of the structure, could > solve the problem for most users. > > But nobody knows what other software is out there being used by > individuals or small groups. And the more remote the authors > of that software are from protein structure solution the more > likely it is that they have not/will not properly handle atoms > with zero occupancy or high B values, for example. > > I am absolutely positive that there is software that does its > voodoo on ATOM/HETATM records and pays absolutely no attention > to anything beyond the x, y, z coordinates (i.e. beyond column 54). > > Frances Bernstein > > = > Bernstein + Sons > * * Information Systems Consultants > 5 Brewster Lane, Bellport, NY 11713-2803 > * * *** > * Frances C. Bernstein > * *** f...@bernstein-plus-sons.com > *** * > * *** 1-631-286-1339 FAX: 1-631-286-1999 > = > > On Sat, 2 Apr 2011, Jacob Keller wrote: > > I guess I missed it in the flurry of replies to this thread over the > > last few days, but what exactly is so terrible about keeping the atoms > > (since you have chemical evidence from protein sequence that they are > > there, and even if there is X-ray damage they were originally there and > > are likely still there in a subset of the molecules), but changing > > occupancy to zero as an acknowledgment that your data does not provide > > evidence to support a specific atomic position for these atoms? > > > > Some users might pull up the structure, see those atoms, and think > > their positions were based on data, which they were not, and then draw > > conclusions based on them. I agree that occ=0 is tantamount to the > > suggestion you queried, however. > > > > A somewhat key question might be: across the various molecular > > visualization programs, what is the default way to handle atoms with > > occ=0? Perhaps those programs might be the best place to fix the > > problem... > > > > JPK > > > > > > *** > > Jacob Pearson Keller > > Northwestern University > > Medical Scientist Training Program > > cel: 773.608.9185 > > email: j-kell...@northwestern.edu > > *** > > > > -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program cel: 773.608.9185 email: j-kell...@northwestern.edu ***
Re: [ccp4bb] what to do with disordered side chains
The original posting that started this thread referred to side-chains, as the subject still suggests. Do you propose to omit only side-chain atoms, in which case you end up with different residues, as pointed out by quite a few people,or do you suggest also to omit the main-chain atoms of the problematic residues ? Besides, as mentioned by Phoebe and others, many users (non-crystallographers) of PDB's know already the meaning of the B-factor and will know how to interpret a very high B. It is our task (the crystallographers) to enllighten those who don't know what the B column in a PDB entry stands for. I certainly do and I'm sure many of us do so too. I voted for high B and would vote for it again, if asked. Cheers, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] On Behalf Of Bernhard Rupp (Hofkristallrat a.D.) [hofkristall...@gmail.com] Sent: Sunday, April 03, 2011 7:42 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] what to do with disordered side chains Thus my feeling is that if one does NOT see the coords in the electron density, they should NOT be included, and let someone else try to model them in, but they should be aware that they are modeling them. Joel L. Sussman Concur. BMC p 680 ‘How to handle missing parts’ Best wishes, BR On 3 Apr 2011, at 06:15, Frances C. Bernstein wrote: Doing something sensible in the major software packages, both for graphics and for other analysis of the structure, could solve the problem for most users. But nobody knows what other software is out there being used by individuals or small groups. And the more remote the authors of that software are from protein structure solution the more likely it is that they have not/will not properly handle atoms with zero occupancy or high B values, for example. I am absolutely positive that there is software that does its voodoo on ATOM/HETATM records and pays absolutely no attention to anything beyond the x, y, z coordinates (i.e. beyond column 54). Frances Bernstein = Bernstein + Sons * * Information Systems Consultants 5 Brewster Lane, Bellport, NY 11713-2803 * * *** *Frances C. Bernstein * *** f...@bernstein-plus-sons.com<mailto:f...@bernstein-plus-sons.com> *** * * *** 1-631-286-1339FAX: 1-631-286-1999 = On Sat, 2 Apr 2011, Jacob Keller wrote: I guess I missed it in the flurry of replies to this thread over the last few days, but what exactly is so terrible about keeping the atoms (since you have chemical evidence from protein sequence that they are there, and even if there is X-ray damage they were originally there and are likely still there in a subset of the molecules), but changing occupancy to zero as an acknowledgment that your data does not provide evidence to support a specific atomic position for these atoms? Some users might pull up the structure, see those atoms, and think their positions were based on data, which they were not, and then draw conclusions based on them. I agree that occ=0 is tantamount to the suggestion you queried, however. A somewhat key question might be: across the various molecular visualization programs, what is the default way to handle atoms with occ=0? Perhaps those programs might be the best place to fix the problem... JPK *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program cel: 773.608.9185 email: j-kell...@northwestern.edu<mailto:j-kell...@northwestern.edu> ***
Re: [ccp4bb] what to do with disordered side chains
Thus my feeling is that if one does NOT see the coords in the electron density, they should NOT be included, and let someone else try to model them in, but they should be aware that they are modeling them. Joel L. Sussman Concur. BMC p 680 'How to handle missing parts' Best wishes, BR On 3 Apr 2011, at 06:15, Frances C. Bernstein wrote: Doing something sensible in the major software packages, both for graphics and for other analysis of the structure, could solve the problem for most users. But nobody knows what other software is out there being used by individuals or small groups. And the more remote the authors of that software are from protein structure solution the more likely it is that they have not/will not properly handle atoms with zero occupancy or high B values, for example. I am absolutely positive that there is software that does its voodoo on ATOM/HETATM records and pays absolutely no attention to anything beyond the x, y, z coordinates (i.e. beyond column 54). Frances Bernstein = Bernstein + Sons * * Information Systems Consultants 5 Brewster Lane, Bellport, NY 11713-2803 * * *** *Frances C. Bernstein * *** f...@bernstein-plus-sons.com *** * * *** 1-631-286-1339FAX: 1-631-286-1999 = On Sat, 2 Apr 2011, Jacob Keller wrote: I guess I missed it in the flurry of replies to this thread over the last few days, but what exactly is so terrible about keeping the atoms (since you have chemical evidence from protein sequence that they are there, and even if there is X-ray damage they were originally there and are likely still there in a subset of the molecules), but changing occupancy to zero as an acknowledgment that your data does not provide evidence to support a specific atomic position for these atoms? Some users might pull up the structure, see those atoms, and think their positions were based on data, which they were not, and then draw conclusions based on them. I agree that occ=0 is tantamount to the suggestion you queried, however. A somewhat key question might be: across the various molecular visualization programs, what is the default way to handle atoms with occ=0? Perhaps those programs might be the best place to fix the problem... JPK *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program cel: 773.608.9185 email: j-kell...@northwestern.edu ***
Re: [ccp4bb] what to do with disordered side chains
If these users exist (I don't doubt that they do), then they would also might think that lysine residues sometimes look identical to alanine - if the atoms after beta carbon of the lysine are deleted in the PDB due to lack of density. So, I guess, if one's objectives in solving structures are to provide these users with coordinates that they could us, then it would make more sense to me to find out what one's "customers" want, rather than speculating about it. Or at least, train one's "customers" how to use one's products - I believe that's what people in business do. However, I think that many people solve structures for their own consumption - they are their own customers - therefore, it's really up to them to cook it anyway they find most tasteful. Others can agree or disagree, but we know that not everybody has the same taste. Cheers, Quyen On Apr 3, 2011, at 12:54 AM, Prof. Joel L. Sussman wrote: > I think Frances is right, i.e. most non crystallographers ignore > "anything beyond the x, y, z coordinates (i.e. beyond column 54)" > [as Frances wrote]. > Thus if a crystallographer put in coords that he/she does NOT see, > even with OCC=0, or an enormously large Bfactor, these coords are usually > treated in just the same way that experimentally observed coords are treated. > Thus my feeling is that if one does NOT see the coords in the electron > density, they should NOT be included, and let someone else try to model > them in, but they should be aware that they are modeling them. > Joel L. Sussman > > > > On 3 Apr 2011, at 06:15, Frances C. Bernstein wrote: > >> Doing something sensible in the major software packages, both >> for graphics and for other analysis of the structure, could >> solve the problem for most users. >> >> But nobody knows what other software is out there being used by >> individuals or small groups. And the more remote the authors >> of that software are from protein structure solution the more >> likely it is that they have not/will not properly handle atoms >> with zero occupancy or high B values, for example. >> >> I am absolutely positive that there is software that does its >> voodoo on ATOM/HETATM records and pays absolutely no attention >> to anything beyond the x, y, z coordinates (i.e. beyond column 54). >> >>Frances Bernstein >> >> = >> Bernstein + Sons >> * * Information Systems Consultants >> 5 Brewster Lane, Bellport, NY 11713-2803 >> * * *** >> *Frances C. Bernstein >> * *** f...@bernstein-plus-sons.com >> *** * >> * *** 1-631-286-1339FAX: 1-631-286-1999 >> = >> >> On Sat, 2 Apr 2011, Jacob Keller wrote: >> I guess I missed it in the flurry of replies to this thread over the last few days, but what exactly is so terrible about keeping the atoms (since you have chemical evidence from protein sequence that they are there, and even if there is X-ray damage they were originally there and are likely still there in a subset of the molecules), but changing occupancy to zero as an acknowledgment that your data does not provide evidence to support a specific atomic position for these atoms? >>> >>> Some users might pull up the structure, see those atoms, and think >>> their positions were based on data, which they were not, and then draw >>> conclusions based on them. I agree that occ=0 is tantamount to the >>> suggestion you queried, however. >>> >>> A somewhat key question might be: across the various molecular >>> visualization programs, what is the default way to handle atoms with >>> occ=0? Perhaps those programs might be the best place to fix the >>> problem... >>> >>> JPK >>> >>> >>> *** >>> Jacob Pearson Keller >>> Northwestern University >>> Medical Scientist Training Program >>> cel: 773.608.9185 >>> email: j-kell...@northwestern.edu >>> *** >>> >
Re: [ccp4bb] Jrh input Re: [ccp4bb] what to do with disordered side chains
Dear Ed, Many thanks for this careful explanation, which I appreciate. I realise in my own practise on such matters I have two situations :- (i) where, albeit limited, electron density evidence, coupled with chemical evidence, leads to an attempted model atomic fit but the B factors there sky rocket. (ii) there is no electron density evidence and even though the chemical evidence is sound one can in fact add nothing. Here I simply concede that there is nothing one can do. The issue here is not whether to keep these atoms but simply you really cannot make a start finding them. Greetings, John On Fri, Apr 1, 2011 at 4:03 PM, Ed Pozharski wrote: > Dear John, > > there may be reasons to disagree with both options. This has been a > recurring discussion for many years, and in my mind the most convincing > arguments for both sides are as follows: > > "Keepers": > > I know the side chain is there and the high ADP is a good approximation > of reality. Removing atoms causes such a mess for the end user. > > "Deleters": > > We don't model missing loops, termini, ligands and waters when there is > no density, and side chains should not be treated differently. Most end > users think ADP is a nucleotide and will over-interpret the model. > > I am a "keeper" when it comes to end user treatment, but a recently > converted "deleter" when it comes to modeling (a rather stressful > position). So I am not taking sides really, but rather looking for a > middle way. (Have to admit that my secret goal was to knock down the > zero occupancy fallacy :) > > Perhaps these ideas are worth exploring: > > 1. Provide dual representation - a crystallographic model and an > end-user model, both downloadable from the PDB. > 2. Model missing side chains "NMR-way" > 3. A new data file format is needed (mmCIF?) that combines atomic model > with electron density, and visualization/analysis software shall be > modified to always utilize the experimental data > 4. Implement reduced ADP restraints for disordered side chains to > further reduce model bias > > But ultimately, as long as experimental data is deposited, I believe > that people are free to interpret their data the way they see fit. > Others are then free to look at the electron density and become outraged > at the interpretation. > > Cheers, > > Ed. > > On Thu, 2011-03-31 at 23:25 +0100, Jrh wrote: >> Dear Ed, >> Thankyou for this and apologies for late reply. >> If one has chemical evidence for the presence of residues but these >> residues are disordered I find the delete atoms option disagreeable. >> Such a static disorder situation should be described by a high atomic >> displacement parameter, in my view. (nb the use of ADP is better than >> B factor terminology). >> Yours sincerely, >> John >> Prof John R Helliwell DSc >> > > -- > "I'd jump in myself, if I weren't so good at whistling." > Julian, King of Lemurs > > -- Professor John R Helliwell DSc
Re: [ccp4bb] what to do with disordered side chains
I think Frances is right, i.e. most non crystallographers ignore "anything beyond the x, y, z coordinates (i.e. beyond column 54)" [as Frances wrote]. Thus if a crystallographer put in coords that he/she does NOT see, even with OCC=0, or an enormously large Bfactor, these coords are usually treated in just the same way that experimentally observed coords are treated. Thus my feeling is that if one does NOT see the coords in the electron density, they should NOT be included, and let someone else try to model them in, but they should be aware that they are modeling them. Joel L. Sussman On 3 Apr 2011, at 06:15, Frances C. Bernstein wrote: > Doing something sensible in the major software packages, both > for graphics and for other analysis of the structure, could > solve the problem for most users. > > But nobody knows what other software is out there being used by > individuals or small groups. And the more remote the authors > of that software are from protein structure solution the more > likely it is that they have not/will not properly handle atoms > with zero occupancy or high B values, for example. > > I am absolutely positive that there is software that does its > voodoo on ATOM/HETATM records and pays absolutely no attention > to anything beyond the x, y, z coordinates (i.e. beyond column 54). > >Frances Bernstein > > = > Bernstein + Sons > * * Information Systems Consultants > 5 Brewster Lane, Bellport, NY 11713-2803 > * * *** > *Frances C. Bernstein > * *** f...@bernstein-plus-sons.com > *** * > * *** 1-631-286-1339FAX: 1-631-286-1999 > = > > On Sat, 2 Apr 2011, Jacob Keller wrote: > >>> I guess I missed it in the flurry of replies to this thread over the >>> last few days, but what exactly is so terrible about keeping the atoms >>> (since you have chemical evidence from protein sequence that they are >>> there, and even if there is X-ray damage they were originally there and >>> are likely still there in a subset of the molecules), but changing >>> occupancy to zero as an acknowledgment that your data does not provide >>> evidence to support a specific atomic position for these atoms? >> >> Some users might pull up the structure, see those atoms, and think >> their positions were based on data, which they were not, and then draw >> conclusions based on them. I agree that occ=0 is tantamount to the >> suggestion you queried, however. >> >> A somewhat key question might be: across the various molecular >> visualization programs, what is the default way to handle atoms with >> occ=0? Perhaps those programs might be the best place to fix the >> problem... >> >> JPK >> >> >> *** >> Jacob Pearson Keller >> Northwestern University >> Medical Scientist Training Program >> cel: 773.608.9185 >> email: j-kell...@northwestern.edu >> *** >>
Re: [ccp4bb] what to do with disordered side chains
Doing something sensible in the major software packages, both for graphics and for other analysis of the structure, could solve the problem for most users. But nobody knows what other software is out there being used by individuals or small groups. And the more remote the authors of that software are from protein structure solution the more likely it is that they have not/will not properly handle atoms with zero occupancy or high B values, for example. I am absolutely positive that there is software that does its voodoo on ATOM/HETATM records and pays absolutely no attention to anything beyond the x, y, z coordinates (i.e. beyond column 54). Frances Bernstein = Bernstein + Sons * * Information Systems Consultants 5 Brewster Lane, Bellport, NY 11713-2803 * * *** *Frances C. Bernstein * *** f...@bernstein-plus-sons.com *** * * *** 1-631-286-1339FAX: 1-631-286-1999 = On Sat, 2 Apr 2011, Jacob Keller wrote: I guess I missed it in the flurry of replies to this thread over the last few days, but what exactly is so terrible about keeping the atoms (since you have chemical evidence from protein sequence that they are there, and even if there is X-ray damage they were originally there and are likely still there in a subset of the molecules), but changing occupancy to zero as an acknowledgment that your data does not provide evidence to support a specific atomic position for these atoms? Some users might pull up the structure, see those atoms, and think their positions were based on data, which they were not, and then draw conclusions based on them. I agree that occ=0 is tantamount to the suggestion you queried, however. A somewhat key question might be: across the various molecular visualization programs, what is the default way to handle atoms with occ=0? Perhaps those programs might be the best place to fix the problem... JPK *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program cel: 773.608.9185 email: j-kell...@northwestern.edu ***
Re: [ccp4bb] what to do with disordered side chains
> I guess I missed it in the flurry of replies to this thread over the > last few days, but what exactly is so terrible about keeping the atoms > (since you have chemical evidence from protein sequence that they are > there, and even if there is X-ray damage they were originally there and > are likely still there in a subset of the molecules), but changing > occupancy to zero as an acknowledgment that your data does not provide > evidence to support a specific atomic position for these atoms? Some users might pull up the structure, see those atoms, and think their positions were based on data, which they were not, and then draw conclusions based on them. I agree that occ=0 is tantamount to the suggestion you queried, however. A somewhat key question might be: across the various molecular visualization programs, what is the default way to handle atoms with occ=0? Perhaps those programs might be the best place to fix the problem... JPK *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program cel: 773.608.9185 email: j-kell...@northwestern.edu ***
Re: [ccp4bb] what to do with disordered side chains
> >Just create a new tag, say _atom_site.imaginary_site, which is either true or false for every atom. Then everyone would be able to either filter out the fake atoms or leave them in, without ambiguity or confusion. > Aside from being a binary rather than continuous parameter, how exactly does this suggestion differ from the occupancy column of the pdb, that we already have? I guess I missed it in the flurry of replies to this thread over the last few days, but what exactly is so terrible about keeping the atoms (since you have chemical evidence from protein sequence that they are there, and even if there is X-ray damage they were originally there and are likely still there in a subset of the molecules), but changing occupancy to zero as an acknowledgment that your data does not provide evidence to support a specific atomic position for these atoms? Evette S. Radisky, Ph.D. Assistant Professor Mayo Clinic Cancer Center Griffin Cancer Research Building, Rm 310 4500 San Pablo Road Jacksonville, FL 32224 (904) 953-6372
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Personally I think it is a _good_ thing that those missing atoms are a pain, because it helps ensure you are aware of the problem. As somebody who is in the business of supplying non-structural people with models, and seeing how those models are sometimes (mis)interpreted, I think it's better to inflict that pain than it is to present a model that non-structural people are likely to over-interpret. The PDB provides various manipulated versions of crystal structures, such as biological assemblies. I don't think it would necessarily be a bad idea to build missing atoms back into those sorts of processed files but for the main deposited entry the best way to make sure the model is not abused is to leave out atoms that can't be modeled accurately. Just as an example since you mention surfaces, some of the people I work with calculate solvent accessible surface areas of individual residues for purposes such as engineering cysteines for chemical conjugation, and if residues are modeled into bogus positions just to say all the atoms are there, software that calculates per-residue SASA has to have a reliable way of knowing to ignore those atoms when calculating the area of neighboring residues. Ad hoc solutions like putting very large values in the B column are not clear cut for such a software program to interpret. Leaving the atom out completely is pretty unambiguous. -Eric On Mar 31, 2011, at 7:34 PM, Scott Pegan wrote: > I agree with Zbyszek with the modeling of side chains and stress the > following points: > > 1) It drives me nuts when I find that PDB is missing atoms from side chains. > This requires me to rebuild them to get any use out of the PDB such as > relevant surface renderings or electropotential plots. I am an experienced > structural biologist so that I can immediately identify that they have been > removed and can rebuild them. I feel sorry for my fellow scientists from > other biological fields that can't perform this task readability, thus > removing these atoms from a model limits their usefulness to a wider > scientific audience. > > 2) Not sure if any one has documented the percentage of actual side chains > missing from radiation damage versus heterogeneity in confirmation (i.e. > dissolved a crystal after collection and sent it to Mass Spec). Although > the former likely happens occasionally, my gut tells me that the latter is > significantly more predominant. As a result, absence of atoms from a side > chain in the PDB where the main chain is clearly visible in the electron > density might make for the best statistics for an experimental model, but > does not reflect a reality. > > Scott >
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
In this case, I'm more on ZO's side. Let's say that the refinement program can't get an atom to the right position (for instance, to pick a reasonably realistic example, because you've put a leucine side chain in backwards). In that case, the B-factor for the atom nearest to where there should be one in the structure will get larger to smear out its density and put some in the right place. To a good approximation, the optimal increase in the B-factor will be the one you'd expect for a Gaussian probability distribution, i.e. 8Pi^2/3 times the positional error squared. So a refined B-factor does include a measure of the uncertainty or error in the atom's position. Best wishes, Randy Read On Apr 1 2011, James Holton wrote: I'm not sure I entirely agree with ZO's assessment that a B factor is a measure of uncertainty. Pedantically, all it really is is an instruction to the refinement program to "build" some electron density with a certain width and height at a certain location. The result is then compared to the data, parameters are adjusted, etc. I don't think the B factor is somehow converted into an "error bar" on the calculated electron density, is it? For example, a B-factor of 500 on a carbon atom just means that the "peak" to build is ~0.02 electron/A^3 tall, and ~3 A wide (full width at half maximum). By comparison, a carbon with B=20 is 1.6 electrons/A^3 tall and ~0.7 A wide (FWHM). One of the "bugs" that Dale referred to is the fact that most refinement programs do not "plot" electron density more than 3 A away from each atomic center, so a substantial fraction of the 6 electrons represented by a carbon with B=500 will be sharply "cut off", and missing from the FC calculation. Then again, all 6 electrons will be missing if the atoms are simply not modeled, or if the occupancy is zero. The point I am trying to make here is that there is no B factor that will make an atom "go away", because the way B factors are implemented is to always conserve the total number of electrons in the atom, but just spread them out over more space. Now, a peak height of 0.02 electrons/A^3 may sound like it might as well be zero, especially when sitting next to a B=20 atom, but what if all the atoms have high B factors? For example, if the average (Wilson) B factor is 80 (like it typically is for a ~4A structure), then the average peak height of a carbon atom is 0.3 electrons/A^3, and then 0.02 electrons/A^3 starts to become more significant. If we consider a ~11 A structure, then the average atomic B factor will be around 500. This "B vs resolution" relationship is something I derived empirically from the PDB (Holton JSR 2009). Specifically, the average B factor for PDB files at a given resolution "d" is: B = 4*d^2+12. Admittedly, this is "on average", but the trend does make physical sense: atoms with high B factors don't contribute very much to high-angle spots. More formally, the problem with using a high B-factor as a "flag" is that it is not resolution-general. Dale has already pointed this out. Personally, I prefer to think of B factors as a atom-by-atom "resolution" rather than an "error bar", and this is how I tell students to interpret them (using the B = 4*d^2+12 formula). The problem I have with the "error bar" interpretation is that heterogeneity and uncertainty are not the same thing. That is, just because the atom is "jumping around" does not mean you don't know where the centroid of the distribution is. The "u_x" in B=8*pi^2* does reflect the standard error of atomic position in a GIVEN unit cell, but since we are averaging over trillions of cells, the "error bar" on the AVERAGE atomic position is actually a great deal smaller than "u". I think this distinction is important because what we are building is a model of the AVERAGE electron density, not a single molecule. Just my 0.02 electrons -James Holton MAD Scientist On Fri, Apr 1, 2011 at 10:57 AM, Zbyszek Otwinowski wrote: The meaning of B-factor is the (scaled) sum of all positional uncertainties, and not just its one contributor, the Atomic Displacement Parameter that describes the relative displacement of an atom in the crystal lattice by a Gaussian function. That meaning (the sum of all contributions) comes from the procedure that calculates the B-factor in all PDB X-ray deposits, and not from an arbitrary decision by a committee. All programs that refine B-factors calculate an estimate of positional uncertainty, where contributors can be both Gaussian and non-Gaussian. For a non-Gaussian contributor, e.g. multiple occupancy, the exact numerical contribution is rather a complex function, but conceptually it is still an uncertainty estimate. Given the resolution of the typical data, we do not have a procedure to decouple Gaussian and non-Gaussian contributors, so we have to live with the B-factor being defined by the refinement procedure. However, we should still improve the estimates of the B-fac
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
I'm not sure I entirely agree with ZO's assessment that a B factor is a measure of uncertainty. Pedantically, all it really is is an instruction to the refinement program to "build" some electron density with a certain width and height at a certain location. The result is then compared to the data, parameters are adjusted, etc. I don't think the B factor is somehow converted into an "error bar" on the calculated electron density, is it? For example, a B-factor of 500 on a carbon atom just means that the "peak" to build is ~0.02 electron/A^3 tall, and ~3 A wide (full width at half maximum). By comparison, a carbon with B=20 is 1.6 electrons/A^3 tall and ~0.7 A wide (FWHM). One of the "bugs" that Dale referred to is the fact that most refinement programs do not "plot" electron density more than 3 A away from each atomic center, so a substantial fraction of the 6 electrons represented by a carbon with B=500 will be sharply "cut off", and missing from the FC calculation. Then again, all 6 electrons will be missing if the atoms are simply not modeled, or if the occupancy is zero. The point I am trying to make here is that there is no B factor that will make an atom "go away", because the way B factors are implemented is to always conserve the total number of electrons in the atom, but just spread them out over more space. Now, a peak height of 0.02 electrons/A^3 may sound like it might as well be zero, especially when sitting next to a B=20 atom, but what if all the atoms have high B factors? For example, if the average (Wilson) B factor is 80 (like it typically is for a ~4A structure), then the average peak height of a carbon atom is 0.3 electrons/A^3, and then 0.02 electrons/A^3 starts to become more significant. If we consider a ~11 A structure, then the average atomic B factor will be around 500. This "B vs resolution" relationship is something I derived empirically from the PDB (Holton JSR 2009). Specifically, the average B factor for PDB files at a given resolution "d" is: B = 4*d^2+12. Admittedly, this is "on average", but the trend does make physical sense: atoms with high B factors don't contribute very much to high-angle spots. More formally, the problem with using a high B-factor as a "flag" is that it is not resolution-general. Dale has already pointed this out. Personally, I prefer to think of B factors as a atom-by-atom "resolution" rather than an "error bar", and this is how I tell students to interpret them (using the B = 4*d^2+12 formula). The problem I have with the "error bar" interpretation is that heterogeneity and uncertainty are not the same thing. That is, just because the atom is "jumping around" does not mean you don't know where the centroid of the distribution is. The "u_x" in B=8*pi^2* does reflect the standard error of atomic position in a GIVEN unit cell, but since we are averaging over trillions of cells, the "error bar" on the AVERAGE atomic position is actually a great deal smaller than "u". I think this distinction is important because what we are building is a model of the AVERAGE electron density, not a single molecule. Just my 0.02 electrons -James Holton MAD Scientist On Fri, Apr 1, 2011 at 10:57 AM, Zbyszek Otwinowski wrote: > The meaning of B-factor is the (scaled) sum of all positional > uncertainties, and not just its one contributor, the Atomic Displacement > Parameter that describes the relative displacement of an atom in the > crystal lattice by a Gaussian function. > That meaning (the sum of all contributions) comes from the procedure that > calculates the B-factor in all PDB X-ray deposits, and not from an > arbitrary decision by a committee. All programs that refine B-factors > calculate an estimate of positional uncertainty, where contributors can be > both Gaussian and non-Gaussian. For a non-Gaussian contributor, e.g. > multiple occupancy, the exact numerical contribution is rather a complex > function, but conceptually it is still an uncertainty estimate. Given the > resolution of the typical data, we do not have a procedure to decouple > Gaussian and non-Gaussian contributors, so we have to live with the > B-factor being defined by the refinement procedure. However, we should > still improve the estimates of the B-factor, e.g. by changing the > restraints. In my experience, the Refmac's default restraints on B-factors > in side chains are too tight and I adjust them. Still, my preference would > be to have harmonic restraints on U (square root of B) rather than on Bs > themselves. > It is not we who cram too many meanings on the B-factor, it is the quite > fundamental limitation of crystallographic refinement. > > Zbyszek Otwinowski > >> The fundamental problem remains: we're cramming too many meanings into > one number [B factor]. This the PDB could indeed solve, by giving us > another column. (He said airily, blithely launching a totally new flame > war.) >> phx. >> >
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
> In my experience, the Refmac's default restraints on B-factors in side chains > are too tight and I adjust them. Concur. See BMC p 640. BR
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
The meaning of B-factor is the (scaled) sum of all positional uncertainties, and not just its one contributor, the Atomic Displacement Parameter that describes the relative displacement of an atom in the crystal lattice by a Gaussian function. That meaning (the sum of all contributions) comes from the procedure that calculates the B-factor in all PDB X-ray deposits, and not from an arbitrary decision by a committee. All programs that refine B-factors calculate an estimate of positional uncertainty, where contributors can be both Gaussian and non-Gaussian. For a non-Gaussian contributor, e.g. multiple occupancy, the exact numerical contribution is rather a complex function, but conceptually it is still an uncertainty estimate. Given the resolution of the typical data, we do not have a procedure to decouple Gaussian and non-Gaussian contributors, so we have to live with the B-factor being defined by the refinement procedure. However, we should still improve the estimates of the B-factor, e.g. by changing the restraints. In my experience, the Refmac's default restraints on B-factors in side chains are too tight and I adjust them. Still, my preference would be to have harmonic restraints on U (square root of B) rather than on Bs themselves. It is not we who cram too many meanings on the B-factor, it is the quite fundamental limitation of crystallographic refinement. Zbyszek Otwinowski > The fundamental problem remains: we're cramming too many meanings into one number [B factor]. This the PDB could indeed solve, by giving us another column. (He said airily, blithely launching a totally new flame war.) > phx. >
Re: [ccp4bb] Jrh input Re: [ccp4bb] what to do with disordered side chains
Dear John, there may be reasons to disagree with both options. This has been a recurring discussion for many years, and in my mind the most convincing arguments for both sides are as follows: "Keepers": I know the side chain is there and the high ADP is a good approximation of reality. Removing atoms causes such a mess for the end user. "Deleters": We don't model missing loops, termini, ligands and waters when there is no density, and side chains should not be treated differently. Most end users think ADP is a nucleotide and will over-interpret the model. I am a "keeper" when it comes to end user treatment, but a recently converted "deleter" when it comes to modeling (a rather stressful position). So I am not taking sides really, but rather looking for a middle way. (Have to admit that my secret goal was to knock down the zero occupancy fallacy :) Perhaps these ideas are worth exploring: 1. Provide dual representation - a crystallographic model and an end-user model, both downloadable from the PDB. 2. Model missing side chains "NMR-way" 3. A new data file format is needed (mmCIF?) that combines atomic model with electron density, and visualization/analysis software shall be modified to always utilize the experimental data 4. Implement reduced ADP restraints for disordered side chains to further reduce model bias But ultimately, as long as experimental data is deposited, I believe that people are free to interpret their data the way they see fit. Others are then free to look at the electron density and become outraged at the interpretation. Cheers, Ed. On Thu, 2011-03-31 at 23:25 +0100, Jrh wrote: > Dear Ed, > Thankyou for this and apologies for late reply. > If one has chemical evidence for the presence of residues but these > residues are disordered I find the delete atoms option disagreeable. > Such a static disorder situation should be described by a high atomic > displacement parameter, in my view. (nb the use of ADP is better than > B factor terminology). > Yours sincerely, > John > Prof John R Helliwell DSc > -- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Hi Robbie > If it's probability you're after, if there's no density to guide you > (very common!) you'd have to place all "likely" rotamers that don't > clash with anything, and set their occupancies to their probability (as > encoded in the rotamer library). Which library? The one for all side chains of a specific type, or the one for a specific type with a given backbone conformation? These are quite different and change with the content of the PDB. 'Hacking' the occupancies is risky bussiness in general: errors are made quite easily. I frequently encounter side chains with partial occupancies but no alternatives, how can I relate this to the experimental date? Even worse, I also see cases where the occupancies of alternates sum up to values > 1.00. What does that mean? Is that a local increase of DarmMatter accidentally encoded in the occupancy? Actually, I wasn't advocating it - I was taking ZO's suggestion to it's logical conclusion to point out the problem, namely deciding what is "most likely". This you underline with your (very valid) question. > Until the PDB is expanded, the conventions need to be clear, and I > thought they were: > High B-factor ==> means atom is there but density is weak > Atom missing ==> no density to support it. Unfortunately, it is not trivial to decide when there is 'no density'. We must have a good metric to do this, but I don't think it exists yet. Removing atoms is thus very subjective. This explaines why I frequently find positive difference density peaks near missing side chains. Leaving side chains in sometimes gives negative difference density but refining them with proper B-factor restrainsts reduces the problem a lot. There is still the problem of radiation damage, but that is relatively small. At least refining the B-factor is more reproducible and less subjective than making the binary choice to keep or remove an atom. (Radiation damage is NOT a "relatively small" problem.) The fundamental problem remains: we're cramming too many meanings into one number. This the PDB could indeed solve, by giving us another column. (He said airily, blithely launching a totally new flame war.) phx.
Re: [ccp4bb] what to do with disordered side chains
Dear Gerard, I agree with you based on debates at some conferences. But, based on what I have seen here so far, it seems to me that everybody knows exactly what to do with disordered side chains. People that want to build structures to best fit the data tend to prefer omitting disordered side chains. On the other hand, people that want to build structures to best represent reality tend to prefer building them. I don't see any disagreement here nor do I see any problems with either approach. Different people collect the same data to study different things and I feel that they are entitle to view and interpret the data the way that they fine most meaningful. Equations are attempts to describe reality, I don't see why we should constrain reality to fit equations. Cheers, Quyen On Mar 31, 2011, at 12:21 PM, Gerard Bricogne wrote: > Dear Quyen, > > On Thu, Mar 31, 2011 at 11:27:58AM -0400, Quyen Hoang wrote: >> Thank you for your post, Herman. >> Since there is no holy bible to provide guidance, perhaps we should hold >> off the idea of electing a "powerful dictator" to enforce this? >> - at least until we all can come to a consensus on how the "dictator" >> should dictate... >> > > ... but that might well be even harder than to decide what to do with > disordered side chains ... . > > > With best wishes, > > Gerard. > > -- > > === > * * > * Gerard Bricogne g...@globalphasing.com * > * * > * Global Phasing Ltd. * > * Sheraton House, Castle Park Tel: +44-(0)1223-353033 * > * Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 * > * * > === >> >> >> On Mar 31, 2011, at 10:22 AM, herman.schreu...@sanofi-aventis.com wrote: >> >>> Dear Quyen, >>> I am afraid you won't get any better answers than you got so far. There is >>> no holy bible telling you what to do with disordered side chains. I fully >>> agree with James that you should try to get the best possible model, which >>> best explains your data and that will be your decision. Here are my 2 >>> cents: >>> >>> -If you see alternative positions, you have to build them. >>> -If you do not see alternative positions, I would not replace one fantasy >>> (some call it most likely) orientation with 2 or 3 fantasy orientations. >>> -I personally belong to the "let the B-factors take care of it" camp, but >>> that is my personal opinion. Leaving side chains out could lead to >>> misinterpretations by slightly less savy users of our data, especially >>> when charge distributions are being studied. Besides, we know (almost) for >>> sure that the side chain is there, it is only disordered and as we just >>> learned, even slightly less savy users know what flaming red side chains >>> mean. Even if they may not be mathematically entirely correct, huge >>> B-factors clearly indicate that there is disorder involved. >>> -I would not let occupancies take up the slack since even very savy users >>> have never heard of them and again, the side chain is fully occupied, only >>> disordered. Of course if you build alternate positions, you have to divede >>> the occupancies amongst them. >>> >>> Best, >>> Herman >>> >>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of >>> Quyen Hoang >>> Sent: Thursday, March 31, 2011 3:55 PM >>> To: CCP4BB@JISCMAIL.AC.UK >>> Subject: Re: [ccp4bb] what to do with disordered side chains >>> >>> We are getting off topic a little bit. >>> >>> Original topic: is it better to not build disordered sidechains or build >>> them and let B-factors take care of it? >>> Ed's poll got almost a 50:50 split. >>> Question still unanswered. >>> >>> Second topic introduced by Pavel: "Your B-factors are valid within a >>> harmonic (small) approximation of atomic vibrations. Larger scale motions >>> you are talking about go beyond the harmonic approximation, and using the >>> B-factor to model them is abusing the corresponding mathematical model." >>> And that these large scale motions (disorders) a
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Hi Frank, > > I described in the previous e-mail the probabilistic interpretation of > > B-factors. In the case of very high uncertainty = poorly ordered side > > chains, I prefer to deposit the conformer representing maximum a > > posteriori, even if it does not represent all possible conformations. > > Maximum a posteriori will have significant contribution from the most > > probable conformation of side chain (prior knowledge) and should not > > conflict with likelihood (electron density map). > > Thus, in practice I model the most probable conformation as long as it > > it in even very weak electron density, does not overlap significantly > > with negative difference electron density and do not clash with other > > residues. > If it's probability you're after, if there's no density to guide you > (very common!) you'd have to place all "likely" rotamers that don't > clash with anything, and set their occupancies to their probability (as > encoded in the rotamer library). Which library? The one for all side chains of a specific type, or the one for a specific type with a given backbone conformation? These are quite different and change with the content of the PDB. 'Hacking' the occupancies is risky bussiness in general: errors are made quite easily. I frequently encounter side chains with partial occupancies but no alternatives, how can I relate this to the experimental date? Even worse, I also see cases where the occupancies of alternates sum up to values > 1.00. What does that mean? Is that a local increase of DarmMatter accidentally encoded in the occupancy? > This is now veering into data-free protein modeling territory... wasn't > the idea to present to the downstream user an atomic representation of > what the electron density shows us? Yes, but what we see can be deceiving. > Worse, what we're also doing is encoding multiple different things in > one place - what database people call "poorly normalised", i.e. to > understand a data field requires further parsing and if statements. In > this case: to know whether there was no density, as end-user I'd have > to have to second-guess what exactly those > high-B-factor-variable-occupancy atoms mean. > > Until the PDB is expanded, the conventions need to be clear, and I > thought they were: > High B-factor ==> means atom is there but density is weak > Atom missing ==> no density to support it. Unfortunately, it is not trivial to decide when there is 'no density'. We must have a good metric to do this, but I don't think it exists yet. Removing atoms is thus very subjective. This explaines why I frequently find positive difference density peaks near missing side chains. Leaving side chains in sometimes gives negative difference density but refining them with proper B-factor restrainsts reduces the problem a lot. There is still the problem of radiation damage, but that is relatively small. At least refining the B-factor is more reproducible and less subjective than making the binary choice to keep or remove an atom. Cheers, Robbie > > Oh well... > phx.
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Dear All, just in time for the heated discussion of missing density, high B-factors, and split conformations I have a paper in the most recent Phys Rev Letters that provides a rational explanation for the observation of split conformations when the equilibrium density becomes weak. The first page is available from my web site, http://www.ruppweb.org/select/Rupp_2011_Phys_Rev_Letters_160_13_B-factor.pdf but for copyright reasons I ask that you please email me for the entire paper. Comments are welcome, BR - Bernhard Rupp 001 (925) 209-7429 +43 (676) 571-0536 b...@ruppweb.org hofkristall...@gmail.com http://www.ruppweb.org/ - People can be divided in three classes: The few who make things happen The many who watch things happen And the overwhelming majority who have no idea what is happening. -
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
On 31/03/2011 23:43, Zbyszek Otwinowski wrote: Regarding the closing statement about the best solution to poorly ordered side chains: I described in the previous e-mail the probabilistic interpretation of B-factors. In the case of very high uncertainty = poorly ordered side chains, I prefer to deposit the conformer representing maximum a posteriori, even if it does not represent all possible conformations. Maximum a posteriori will have significant contribution from the most probable conformation of side chain (prior knowledge) and should not conflict with likelihood (electron density map). Thus, in practice I model the most probable conformation as long as it it in even very weak electron density, does not overlap significantly with negative difference electron density and do not clash with other residues. If it's probability you're after, if there's no density to guide you (very common!) you'd have to place all "likely" rotamers that don't clash with anything, and set their occupancies to their probability (as encoded in the rotamer library). This is now veering into data-free protein modeling territory... wasn't the idea to present to the downstream user an atomic representation of what the electron density shows us? Worse, what we're also doing is encoding multiple different things in one place - what database people call "poorly normalised", i.e. to understand a data field requires further parsing and if statements. In this case: to know whether there was no density, as end-user I'd have to have to second-guess what exactly those high-B-factor-variable-occupancy atoms mean. Until the PDB is expanded, the conventions need to be clear, and I thought they were: High B-factor ==> means atom is there but density is weak Atom missing ==> no density to support it. Oh well... phx.
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
I agree with Zbyszek with the modeling of side chains and stress the following points: 1) It drives me nuts when I find that PDB is missing atoms from side chains. This requires me to rebuild them to get any use out of the PDB such as relevant surface renderings or electropotential plots. I am an experienced structural biologist so that I can immediately identify that they have been removed and can rebuild them. I feel sorry for my fellow scientists from other biological fields that can't perform this task readability, thus removing these atoms from a model limits their usefulness to a wider scientific audience. 2) Not sure if any one has documented the percentage of actual side chains missing from radiation damage versus heterogeneity in confirmation (i.e. dissolved a crystal after collection and sent it to Mass Spec). Although the former likely happens occasionally, my gut tells me that the latter is significantly more predominant. As a result, absence of atoms from a side chain in the PDB where the main chain is clearly visible in the electron density might make for the best statistics for an experimental model, but does not reflect a reality. Scott On Thu, Mar 31, 2011 at 4:43 PM, Zbyszek Otwinowski wrote: > Regarding the closing statement about the best solution to poorly ordered > side chains: > > I described in the previous e-mail the probabilistic interpretation of > B-factors. In the case of very high uncertainty = poorly ordered side > chains, I prefer to deposit the conformer representing maximum a posteriori, > even if it does not represent all possible conformations. > Maximum a posteriori will have significant contribution from the most > probable conformation of side chain (prior knowledge) and should not > conflict with likelihood (electron density map). > Thus, in practice I model the most probable conformation as long as it it > in even very weak electron density, does not overlap significantly with > negative difference electron density and do not clash with other residues. > > As a user of PDB files I much prefer the simplest and the most informative > representation of the result. Removing parts of side chains that carry > charges, as already mentioned, is not particularly helpful for the > downstream uses. NMR-like deposits are not among my favorites, either. > Having multiple conformations with low occupancies increases potential for a > confusion, while benefits are not clear to me. > > Zbyszek > > Frank von Delft wrote: > >> This is a lovely summary, and we should make our students read it. - But >> I'm afraid I do not see how it supports the closing statement in the last >> paragraph... phx. >> >> >> On 31/03/2011 17:06, Zbyszek Otwinowski wrote: >> >>> The B-factor in crystallography represents the convolution (sum) of two >>> types of uncertainties about the atom (electron cloud) position: >>> >>> 1) dispersion of atom positions in crystal lattice >>> 2) uncertainty of the experimenter's knowledge about the atom position. >>> >>> In general, uncertainty needs not to be described by Gaussian function. >>> However, communicating uncertainty using the second moment of its >>> distribution is a widely accepted practice, with frequently implied >>> meaning that it corresponds to a Gaussian probability function. B-factor >>> is simply a scaled (by 8 times pi squared) second moment of uncertainty >>> distribution. >>> >>> In the previous, long thread, confusion was generated by the additional >>> assumption that B-factor also corresponds to a Gaussian probability >>> distribution and not just to a second moment of any probability >>> distribution. Crystallographic literature often implies the Gaussian >>> shape, so there is some justification for such an interpretation, where >>> the more complex probability distribution is represented by the sum of >>> displaced Gaussians, where the area under each Gaussian component >>> corresponds to the occupancy of an alternative conformation. >>> >>> For data with a typical resolution for macromolecular crystallography, >>> such multi-Gaussian description of the atom position's uncertainty is not >>> practical, as it would lead to instability in the refinement and/or >>> overfitting. Due to this, a simplified description of the atom's position >>> uncertainty by just the second moment of probability distribution is the >>> right approach. For this reason, the PDB format is highly suitable for >>> the >>> description of positional uncertainties, the only difference with other >>> fields being the unusual form of squaring and then scaling up the >>> standard >>> uncertainty. As this calculation can be easily inverted, there is no loss >>> of information. However, in teaching one should probably stress more this >>> unusual form of presenting the standard deviation. >>> >>> A separate issue is the use of restraints on B-factor values, a subject >>> that probably needs a longer discussion. >>> >>> With respect to the previous thre
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Regarding the closing statement about the best solution to poorly ordered side chains: I described in the previous e-mail the probabilistic interpretation of B-factors. In the case of very high uncertainty = poorly ordered side chains, I prefer to deposit the conformer representing maximum a posteriori, even if it does not represent all possible conformations. Maximum a posteriori will have significant contribution from the most probable conformation of side chain (prior knowledge) and should not conflict with likelihood (electron density map). Thus, in practice I model the most probable conformation as long as it it in even very weak electron density, does not overlap significantly with negative difference electron density and do not clash with other residues. As a user of PDB files I much prefer the simplest and the most informative representation of the result. Removing parts of side chains that carry charges, as already mentioned, is not particularly helpful for the downstream uses. NMR-like deposits are not among my favorites, either. Having multiple conformations with low occupancies increases potential for a confusion, while benefits are not clear to me. Zbyszek Frank von Delft wrote: This is a lovely summary, and we should make our students read it. - But I'm afraid I do not see how it supports the closing statement in the last paragraph... phx. On 31/03/2011 17:06, Zbyszek Otwinowski wrote: The B-factor in crystallography represents the convolution (sum) of two types of uncertainties about the atom (electron cloud) position: 1) dispersion of atom positions in crystal lattice 2) uncertainty of the experimenter's knowledge about the atom position. In general, uncertainty needs not to be described by Gaussian function. However, communicating uncertainty using the second moment of its distribution is a widely accepted practice, with frequently implied meaning that it corresponds to a Gaussian probability function. B-factor is simply a scaled (by 8 times pi squared) second moment of uncertainty distribution. In the previous, long thread, confusion was generated by the additional assumption that B-factor also corresponds to a Gaussian probability distribution and not just to a second moment of any probability distribution. Crystallographic literature often implies the Gaussian shape, so there is some justification for such an interpretation, where the more complex probability distribution is represented by the sum of displaced Gaussians, where the area under each Gaussian component corresponds to the occupancy of an alternative conformation. For data with a typical resolution for macromolecular crystallography, such multi-Gaussian description of the atom position's uncertainty is not practical, as it would lead to instability in the refinement and/or overfitting. Due to this, a simplified description of the atom's position uncertainty by just the second moment of probability distribution is the right approach. For this reason, the PDB format is highly suitable for the description of positional uncertainties, the only difference with other fields being the unusual form of squaring and then scaling up the standard uncertainty. As this calculation can be easily inverted, there is no loss of information. However, in teaching one should probably stress more this unusual form of presenting the standard deviation. A separate issue is the use of restraints on B-factor values, a subject that probably needs a longer discussion. With respect to the previous thread, representing poorly-ordered (so called 'disordered') side chains by the most likely conformer with appropriately high B-factors is fully justifiable, and currently is probably the best solution to a difficult problem. Zbyszek Otwinowski - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. But this "knowledge" may be quite wrong. If the flaming red really indicates large vibrational motion then yes, one whould not bet on stable H-bonds. But if the flaming red indicates that a well-ordered sidechain was incorrectly modeled at full occupancy when in fact it is only present at half-occupancy then no, the H-bond could be strong but only present in that half-occupancy conformation. One presumes that the other half-occupancy location (perhaps missing from the model) would have its own H-bonding network. I beg to differ. If a side chain has 2 or more positions, one should be a bit careful about making firm conclusions based on only one of those, even if it isn't clear exactly why one should use caution. Also, isn't the isotropic B we fit at "medium" resolution more of a "spherical cow" approximation to physical reality anyway? Phoebe Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353 -- Zbysz
[ccp4bb] Jrh input Re: [ccp4bb] what to do with disordered side chains
Dear Ed, Thankyou for this and apologies for late reply. If one has chemical evidence for the presence of residues but these residues are disordered I find the delete atoms option disagreeable. Such a static disorder situation should be described by a high atomic displacement parameter, in my view. (nb the use of ADP is better than B factor terminology). Yours sincerely, John Prof John R Helliwell DSc On 29 Mar 2011, at 22:43, Ed Pozharski wrote: > The results of the online survey on what to do with disordered side > chains (from total of 240 responses): > > Delete the atoms 43% > Let refinement take care of it by inflating B-factors41% > Set occupancy to zero12% > Other 4% > > "Other" suggestions were: > > - Place atoms in most likely spot based on rotomer and contacts and > indicate high positional sigmas on ATMSIG records > - To invent refinement that will spread this residues over many rotamers > as this is what actually happened > - Delet the atoms but retain the original amino acid name > - choose the most common rotamer (B-factors don't "inflate", they just > rise slightly) > - Depends. if the disordered region is unteresting, delete atoms. > Otherwise, try to model it in one or more disordered model (and then > state it clearly in the pdb file) > - In case that no density is in the map, model several conformations of > the missing segment and insert it into the PDB file with zero > occupancies. It is equivalent what the NMR people do. > - Model it in and compare the MD simulations with SAXS > - I would assumne Dale Tronrod suggestion the best. Sigatm labels. > - Let the refinement inflate B-factors, then set occupancy to zero in > the last round. > > Thanks to all for participation, > > Ed. > > -- > "I'd jump in myself, if I weren't so good at whistling." > Julian, King of Lemurs
Re: [ccp4bb] what to do with disordered side chains
Hi All, Notwithstanding the stimulating discussion about the B-factor, I'd like to chime in with my $0.02 on the original question of to build or not to build and what are the rules and standards... and sorry for the lengthy e-mail - I was trying to respond to several comments at once. I thought there was a very well-defined rule: models based on experimental data should represent the experimental data correctly. If a model has parts that are not substantiated by experimental data and are based only on assumptions, it's no longer an experimental model. Based on this, one should leave out the atoms for which there is no observable electron density. And one need not say that "we were unable to build a model of a missing side chains (or any other segments of the structure). There is also no need to guess or fake "most probable" conformations of the unobserved parts. Instead, it should be reported that the segment in question was so flexible that it could not be described by just one or two (and may be three) conformers. As such, this observation stands on its own feet just like any other observation of "visible" segments and there is no need to fake a model. If the work was done properly, a model with missing parts is not intrinsically inferior to other, more complete model. The fact that a side chain displays flexibility may be biologically much more relevant than some well-defined Ile in the core of the molecule. Omitting unobserved side chains from the model would also help avoid assumptions as if we know for sure that the side chain is there. Given side chain actually may not be there for some reason or another. Sequence errors and radiation-induced damage come to mind, for example. The latter is also often the reason that the side chain may not be fully occupied in the structure derived from a specific data set (i.e. the sum of occupancies of all its existing conformations may not be 1, contrary to earlier suggestions in the thread). Back in the day I personally spent large amounts of time and effort constructing and refining multi-conformational models of some side chains because I was sure they were there somewhere. Later on, as we learned more, I realized that some of them have been sheered by radiation damage and actually were not there. As knowledge advances, many of our assumptions may crumble and that's why we ought to keep "experimentally visible" models apart from those with assumed parts. As for the downstream consumers of our models, we may not need to confuse them with strange B factors or occupancies. We just need to give them correct information. Namely, that the given part(s) of the molecule could not be "seen" experimentally due to its flexibility (or, in some cases, to radiation damage). There was an interesting suggestion of two models - one accurately describing the experimental observations and the other for the downstream users. It would be a good way to separate Sci from Fi but there may be a problem. When theories are derived further downstream, it'll be impossible to keep track of what came from Sci and what came from Fi versions. Best regards, N. Ruslan Sanishvili (Nukri), Ph.D. GM/CA-CAT Biosciences Division, ANL 9700 S. Cass Ave. Argonne, IL 60439 Tel: (630)252-0665 Fax: (630)252-0667 rsanishv...@anl.gov -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Dale Tronrud Sent: Thursday, March 31, 2011 4:51 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] what to do with disordered side chains On 3/31/2011 12:52 PM, Jacob Keller wrote: >> The only advantage of a large, positive, number is that it would create >> bugs that are more subtle. > > Although most of the users on this BB probably know more about the > software coding, I am surprised that bugs--even subtle ones--would be > introduced by residues flagged with 0 occupancy and b-factor = 500. > Can you elaborate/enumerate? The principle problems with defining a particular value of the B factor as magical have already been listed in this thread. B factors are usually restrained to the values of the atoms their atom is bonded to and sometimes to other atoms they pack against. You may set the B factor equal to 500.00 but it will not stick. At worst its presence will pull up the B factors of nearby atoms that do have density. In addition, the only refinement program I know of that takes occupancy into account when restraining bond lengths and angles is CNX. The presence of atoms with occ=0 will affect the location of atoms they share geometry restraints with. Of course you could modify the refinement programs, and every other program that reads crystallographic models, to deal with your redefinition of _atom_site.B_iso_or_equiv. In fact you would have to, just as you would have to when you change the definition of any of the parameters in our models. If we ha
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Dale Tronrud wrote: While what you say here is quite true and is useful for us to remember, your list is quite short. I can add another 3) The systematic error introduced by assuming full occupancy for all sites. You are right that structural heterogeneity is an additional factor. Se-Met expression is one of the examples where the Se-Met residue is often not fully incorporated, and therefore its side chains have mixed with Met composition. Obviously, solvent molecules may have partial occupancies. Also, in heavily exposed crystals chemical reactions result in loss of the functional groups (e.g. by decarboxylation). However, in most cases even if side chains have multiple conformations their total occupancy is 1.0. There are, of course, many other factors that we don't account for that our refinement programs tend to dump into the B factors. The definition of that number in the PDB file, as listed in the mmCIF dictionary, only includes your first factor -- http://mmcif.rcsb.org/dictionaries/mmcif_std.dic/Items/_atom_site.B_iso_or_equiv.html and these numbers are routinely interpreted as though that definition is the law. Certainly the whole basis of TLS refinement is that the B factors are really Atomic Displacement Parameters. In addition the stereochemical restraints on B factors are derived from the assumption that these parameters are ADPs. Convoluting all these other factors with the ADPs causes serious problems for those who analyze B factors as measures of motion. The fact that current refinement programs mix all these factors with the ADP for an atom to produce a vaguely defined "B factor" should be considered a flaw to be corrected and not an opportunity to pile even more factors into this field in the PDB file. B-factors describe overall uncertainty of the current model. Refinement programs, which do not introduce or remove parts of the model (e.g. are not able to add additional conformations) intrinsically pile up all uncertainties into B-factors. Solutions, which you would like to see implemented, require a model-building like approach. The test of the success of such approach would be a substantial decrease of R-free values. If anybody can show it, it would be great. Zbyszek Dale Tronrud On 3/31/2011 9:06 AM, Zbyszek Otwinowski wrote: The B-factor in crystallography represents the convolution (sum) of two types of uncertainties about the atom (electron cloud) position: 1) dispersion of atom positions in crystal lattice 2) uncertainty of the experimenter's knowledge about the atom position. In general, uncertainty needs not to be described by Gaussian function. However, communicating uncertainty using the second moment of its distribution is a widely accepted practice, with frequently implied meaning that it corresponds to a Gaussian probability function. B-factor is simply a scaled (by 8 times pi squared) second moment of uncertainty distribution. In the previous, long thread, confusion was generated by the additional assumption that B-factor also corresponds to a Gaussian probability distribution and not just to a second moment of any probability distribution. Crystallographic literature often implies the Gaussian shape, so there is some justification for such an interpretation, where the more complex probability distribution is represented by the sum of displaced Gaussians, where the area under each Gaussian component corresponds to the occupancy of an alternative conformation. For data with a typical resolution for macromolecular crystallography, such multi-Gaussian description of the atom position's uncertainty is not practical, as it would lead to instability in the refinement and/or overfitting. Due to this, a simplified description of the atom's position uncertainty by just the second moment of probability distribution is the right approach. For this reason, the PDB format is highly suitable for the description of positional uncertainties, the only difference with other fields being the unusual form of squaring and then scaling up the standard uncertainty. As this calculation can be easily inverted, there is no loss of information. However, in teaching one should probably stress more this unusual form of presenting the standard deviation. A separate issue is the use of restraints on B-factor values, a subject that probably needs a longer discussion. With respect to the previous thread, representing poorly-ordered (so called 'disordered') side chains by the most likely conformer with appropriately high B-factors is fully justifiable, and currently is probably the best solution to a difficult problem. Zbyszek Otwinowski - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. But this "knowledge" may be quite wrong. If the flaming red really indicates large vibrational motion then yes, one whould not
Re: [ccp4bb] what to do with disordered side chains
On 3/31/2011 12:52 PM, Jacob Keller wrote: The only advantage of a large, positive, number is that it would create bugs that are more subtle. Although most of the users on this BB probably know more about the software coding, I am surprised that bugs--even subtle ones--would be introduced by residues flagged with 0 occupancy and b-factor = 500. Can you elaborate/enumerate? The principle problems with defining a particular value of the B factor as magical have already been listed in this thread. B factors are usually restrained to the values of the atoms their atom is bonded to and sometimes to other atoms they pack against. You may set the B factor equal to 500.00 but it will not stick. At worst its presence will pull up the B factors of nearby atoms that do have density. In addition, the only refinement program I know of that takes occupancy into account when restraining bond lengths and angles is CNX. The presence of atoms with occ=0 will affect the location of atoms they share geometry restraints with. Of course you could modify the refinement programs, and every other program that reads crystallographic models, to deal with your redefinition of _atom_site.B_iso_or_equiv. In fact you would have to, just as you would have to when you change the definition of any of the parameters in our models. If we have to modify code, why not create a solution that is explicit, clear, and as consistent with previous practices as possible? I think that the worst that could happen is that the unexperienced yet b-factor-savvy user would be astonished by the huge b-factors, even if he did not realize they were flags. At best, being surprised at the precise number 500, he would look into the pdb file and see occupancy = zero, google it, and learn something new about crystallography. How about positive difference map peaks on neighboring atoms? How about values for B factors that don't relate to the mean square motion of the atom, despite that being the direct definition of the B factor? The concept of an "unexperienced yet b-factor-savvy user" is amusing. I'm not b-factor-savvy. Atomic displacement values are easy, but I'm learning new subtleties about B factors all the time. The fundamental problem with your solution is that you are trying to cram two pieces of information into a single number. Such density always causes problems. Each concept needs its own value. What two pieces of information into what single number? Occupancy = 0 tells you that the atom cannot be modelled, and B=500 is merely a flag for same, and always goes with occ=0. What is so dense? On the contrary, I think the info is redundant if anything... To be honest I had forgotten that you were proposing that the occupancy be set to zero at the same time. Besides putting two pieces of information in the B factor column (The B factor's value and a flag for "imaginaryness".) You do the same for occupancy (the occupancy's value and a flag for "imaginaryness".) This violates another rule of data structures - that each concept be stored in one, and only one, place. How do you interpret an atom with an occupancy of zero but a B factor of 250? How about an atom with a B factor of 500.00 and an occupancy of 1.00? Now we have the confusing situation that the B factor can only be interpreted in the context of the the value of the occupancy and vice versa. Database-savvy people (and I'm not one of them either) are not going to like this. If you want to calculate the average B factor for a model, certainly those atoms with their B factor = 500 should not be included. However, I gather we do need to include those equal to 500 if their occupancy is not equal to 0.0. This is a mess. In a database application we can't simply SELECT the row with the B factors and average them. We have to SELECT both the B factor and occupancy rows and perform some really weird "if" statements element by element - just to calculate an average! What should be a simple task becomes very complex. Will a graduate student code the calculation correctly? Probably not. They will likely not recall all the complicated interpretations of special values your convention would require. Now consider this. Refinement is running along and the occupancy for an atom happens to overshoot and, in the middle of refinement, assumes a value of 0.00. There is positive difference density the next cycle. (I did say that it overshot.) Should the refinement program interpret that Occ=0.00 to mean that the atom is imaginary and should not be considered as part of the crystallographic model? Wouldn't it be bad if the atom suddenly disappeared because of a fluctuation? Or should the refinement program use one definition of "occupancy" during refinement, but write a PDB file occupancy that has a different definition? (It might be relevant to this line of thought to recall that the TNT refinement package writes each intermediate coordinate file t
Re: [ccp4bb] what to do with disordered side chains
> The only advantage of a large, positive, number is that it would create > bugs that are more subtle. Although most of the users on this BB probably know more about the software coding, I am surprised that bugs--even subtle ones--would be introduced by residues flagged with 0 occupancy and b-factor = 500. Can you elaborate/enumerate? I think that the worst that could happen is that the unexperienced yet b-factor-savvy user would be astonished by the huge b-factors, even if he did not realize they were flags. At best, being surprised at the precise number 500, he would look into the pdb file and see occupancy = zero, google it, and learn something new about crystallography. > The fundamental problem with your solution is that you are trying to > cram two pieces of information into a single number. Such density always > causes problems. Each concept needs its own value. What two pieces of information into what single number? Occupancy = 0 tells you that the atom cannot be modelled, and B=500 is merely a flag for same, and always goes with occ=0. What is so dense? On the contrary, I think the info is redundant if anything... > either. You can't out-think someone who's not paying attention. At > some point you have to assume that people being paid to perform research > will learn the basics of the data they are using, even if you know that > assumption is not 100% true. Well, the problem is not *should* but *do*. Should we print bilingual danger signs in the US? Shouldn't we assume that people know English? But there is danger, and we care about sparing lives. Here too, if we care about the truth being abused or missed, it seems we should go out of our way. JPK -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program cel: 773.608.9185 email: j-kell...@northwestern.edu ***
Re: [ccp4bb] what to do with disordered side chains
On 3/31/2011 10:14 AM, Jacob Keller wrote: >> What do we gain? As Dale pointed out, we are already abusing either occupancy, B-factor or delete the side chain to compensate for our inability to tell the user that the side chain is disordered. With your proposal, we would fudge both occupancy and B-factor, which in my eyes is even worse as fudging just one of the two. > > > We gain clarity to the non-crystallographer user: a b-factor of 278.9 > sounds like possibly something real. A b-factor of exactly 1000 does > not. Both probably have the same believability, viz., ~zero. Also, > setting occupancy = zero is not fudging but rather respectfully > declining to comment based on lack of data. I think it is exactly the > same as omitting residues one cannot see in the density. > These things are never clear unless there is a solid definition of the terms you are using. I don't think you can come up with an "out of band" value for the B factor that doesn't have a legitimate meaning as an atomic displacement parameter for someone. How large a B factor you can meaningfully define depends on your lower resolution limit. People working with electron microscopy or small angle X-ray scattering could easily build models with ADPs far larger than anything we normally encounter. In addition, you can't define "1000" as a magic value since the PDB format will only allow values up to 999.99, and I presume maintaining the PDB format is one of your goals. Of course, you could choose -99.99 as the magic value but that would break all of our existing software and I presume you don't want that either. Actually defining any value for the B factor as the magic value would break all of our software. The only advantage of a large, positive, number is that it would create bugs that are more subtle. The fundamental problem with your solution is that you are trying to cram two pieces of information into a single number. Such density always causes problems. Each concept needs its own value. You could implement your solution easily in mmCIF. Just create a new tag, say _atom_site.imaginary_site, which is either true or false for every atom. Then everyone would be able to either filter out the fake atoms or leave them in, without ambiguity or confusion. If you object that the naive user of structural models wouldn't know to check this tag - they aren't going to know about your magic B factor either. You can't out-think someone who's not paying attention. At some point you have to assume that people being paid to perform research will learn the basics of the data they are using, even if you know that assumption is not 100% true. Dale Tronrud
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
While what you say here is quite true and is useful for us to remember, your list is quite short. I can add another 3) The systematic error introduced by assuming full occupancy for all sites. There are, of course, many other factors that we don't account for that our refinement programs tend to dump into the B factors. The definition of that number in the PDB file, as listed in the mmCIF dictionary, only includes your first factor -- http://mmcif.rcsb.org/dictionaries/mmcif_std.dic/Items/_atom_site.B_iso_or_equiv.html and these numbers are routinely interpreted as though that definition is the law. Certainly the whole basis of TLS refinement is that the B factors are really Atomic Displacement Parameters. In addition the stereochemical restraints on B factors are derived from the assumption that these parameters are ADPs. Convoluting all these other factors with the ADPs causes serious problems for those who analyze B factors as measures of motion. The fact that current refinement programs mix all these factors with the ADP for an atom to produce a vaguely defined "B factor" should be considered a flaw to be corrected and not an opportunity to pile even more factors into this field in the PDB file. Dale Tronrud On 3/31/2011 9:06 AM, Zbyszek Otwinowski wrote: The B-factor in crystallography represents the convolution (sum) of two types of uncertainties about the atom (electron cloud) position: 1) dispersion of atom positions in crystal lattice 2) uncertainty of the experimenter's knowledge about the atom position. In general, uncertainty needs not to be described by Gaussian function. However, communicating uncertainty using the second moment of its distribution is a widely accepted practice, with frequently implied meaning that it corresponds to a Gaussian probability function. B-factor is simply a scaled (by 8 times pi squared) second moment of uncertainty distribution. In the previous, long thread, confusion was generated by the additional assumption that B-factor also corresponds to a Gaussian probability distribution and not just to a second moment of any probability distribution. Crystallographic literature often implies the Gaussian shape, so there is some justification for such an interpretation, where the more complex probability distribution is represented by the sum of displaced Gaussians, where the area under each Gaussian component corresponds to the occupancy of an alternative conformation. For data with a typical resolution for macromolecular crystallography, such multi-Gaussian description of the atom position's uncertainty is not practical, as it would lead to instability in the refinement and/or overfitting. Due to this, a simplified description of the atom's position uncertainty by just the second moment of probability distribution is the right approach. For this reason, the PDB format is highly suitable for the description of positional uncertainties, the only difference with other fields being the unusual form of squaring and then scaling up the standard uncertainty. As this calculation can be easily inverted, there is no loss of information. However, in teaching one should probably stress more this unusual form of presenting the standard deviation. A separate issue is the use of restraints on B-factor values, a subject that probably needs a longer discussion. With respect to the previous thread, representing poorly-ordered (so called 'disordered') side chains by the most likely conformer with appropriately high B-factors is fully justifiable, and currently is probably the best solution to a difficult problem. Zbyszek Otwinowski - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. But this "knowledge" may be quite wrong. If the flaming red really indicates large vibrational motion then yes, one whould not bet on stable H-bonds. But if the flaming red indicates that a well-ordered sidechain was incorrectly modeled at full occupancy when in fact it is only present at half-occupancy then no, the H-bond could be strong but only present in that half-occupancy conformation. One presumes that the other half-occupancy location (perhaps missing from the model) would have its own H-bonding network. I beg to differ. If a side chain has 2 or more positions, one should be a bit careful about making firm conclusions based on only one of those, even if it isn't clear exactly why one should use caution. Also, isn't the isotropic B we fit at "medium" resolution more of a "spherical cow" approximation to physical reality anyway? Phoebe Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
On Thursday, March 31, 2011 10:05:22 am Hailiang Zhang wrote: > Dear Zbyszek: > > Thanks a lot for your good summary. It is very interesting but, do you > think there are some references for more detailed description, especially > from mathematics point of view about correlating B-factor to the Gaussian > probability distribution (the B-factor unit of A^2 is my first doubt as > for the probability distribution description)? Thanks again for your > efforts! > > Best Regards, Hailiang I already cited the IUCr standard once, but here it is again: Trueblood, et al, 1996; Acta Cryst. A52, 770-781 http://dx.doi.org/10.1107/S0108767396005697 -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] what to do with disordered side chains
On Thu, Mar 31, 2011 at 10:14 AM, Jacob Keller < j-kell...@fsm.northwestern.edu> wrote: > Also, > setting occupancy = zero is not fudging but rather respectfully > declining to comment based on lack of data. I think it is exactly the > same as omitting residues one cannot see in the density. No, it's not the same. If you have placed any atoms, even with zero occupancy, you have said something about where you expect the atoms to be, or at least where the refinement program thinks they should be. "Declining to comment" would be deleting them, not guessing. I think a reasonable number could be derived and agreed upon, and > would not be surprised if there is such a derivation or analysis in > the literature answering the question: > > "At what b-factor does modelling an atom become insignificant with > respect to explaining/predicting/fitting the data?" > > That point would be the b-factor/occupancy cutoff. > Although atoms with very high B-factors may have almost no impact on F(calc), if the occupancy is non-zero they will still be driven by gradients with respect to X-ray data, and their positions (or changes thereof) will in turn affect other atoms, through geometry restraints if not F(calc). So there is no point at which these atoms cease to be relevant to the task of fitting. -Nat
Re: [ccp4bb] what to do with disordered side chains
> What do we gain? As Dale pointed out, we are already abusing either > occupancy, B-factor or delete the side chain to compensate for our inability > to tell the user that the side chain is disordered. With your proposal, we > would fudge both occupancy and B-factor, which in my eyes is even worse as > fudging just one of the two. We gain clarity to the non-crystallographer user: a b-factor of 278.9 sounds like possibly something real. A b-factor of exactly 1000 does not. Both probably have the same believability, viz., ~zero. Also, setting occupancy = zero is not fudging but rather respectfully declining to comment based on lack of data. I think it is exactly the same as omitting residues one cannot see in the density. > Also, who should decide on the magic number: the all-knowing gurus at the > protein data bank? Maybe we should really start using cif files, which allow > to specify coordinate uncertainties. I think a reasonable number could be derived and agreed upon, and would not be surprised if there is such a derivation or analysis in the literature answering the question: "At what b-factor does modelling an atom become insignificant with respect to explaining/predicting/fitting the data?" That point would be the b-factor/occupancy cutoff. JPK
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Dear Zbyszek: Thanks a lot for your good summary. It is very interesting but, do you think there are some references for more detailed description, especially from mathematics point of view about correlating B-factor to the Gaussian probability distribution (the B-factor unit of A^2 is my first doubt as for the probability distribution description)? Thanks again for your efforts! Best Regards, Hailiang > The B-factor in crystallography represents the convolution (sum) of two > types of uncertainties about the atom (electron cloud) position: > > 1) dispersion of atom positions in crystal lattice > 2) uncertainty of the experimenter's knowledge about the atom position. > > In general, uncertainty needs not to be described by Gaussian function. > However, communicating uncertainty using the second moment of its > distribution is a widely accepted practice, with frequently implied > meaning that it corresponds to a Gaussian probability function. B-factor > is simply a scaled (by 8 times pi squared) second moment of uncertainty > distribution. > > In the previous, long thread, confusion was generated by the additional > assumption that B-factor also corresponds to a Gaussian probability > distribution and not just to a second moment of any probability > distribution. Crystallographic literature often implies the Gaussian > shape, so there is some justification for such an interpretation, where > the more complex probability distribution is represented by the sum of > displaced Gaussians, where the area under each Gaussian component > corresponds to the occupancy of an alternative conformation. > > For data with a typical resolution for macromolecular crystallography, > such multi-Gaussian description of the atom position's uncertainty is not > practical, as it would lead to instability in the refinement and/or > overfitting. Due to this, a simplified description of the atom's position > uncertainty by just the second moment of probability distribution is the > right approach. For this reason, the PDB format is highly suitable for the > description of positional uncertainties, the only difference with other > fields being the unusual form of squaring and then scaling up the standard > uncertainty. As this calculation can be easily inverted, there is no loss > of information. However, in teaching one should probably stress more this > unusual form of presenting the standard deviation. > > A separate issue is the use of restraints on B-factor values, a subject > that probably needs a longer discussion. > > With respect to the previous thread, representing poorly-ordered (so > called 'disordered') side chains by the most likely conformer with > appropriately high B-factors is fully justifiable, and currently is > probably the best solution to a difficult problem. > > Zbyszek Otwinowski > > > - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. >>> >>>But this "knowledge" may be quite wrong. If the flaming red really >>> indicates >>>large vibrational motion then yes, one whould not bet on stable H-bonds. >>>But if the flaming red indicates that a well-ordered sidechain was >>> incorrectly >>>modeled at full occupancy when in fact it is only present at >>> half-occupancy >>>then no, the H-bond could be strong but only present in that >>> half-occupancy >>>conformation. One presumes that the other half-occupancy location >>> (perhaps >>>missing from the model) would have its own H-bonding network. >>> >> >> I beg to differ. If a side chain has 2 or more positions, one should be >> a >> bit careful about making firm conclusions based on only one of those, >> even >> if it isn't clear exactly why one should use caution. Also, isn't the >> isotropic B we fit at "medium" resolution more of a "spherical cow" >> approximation to physical reality anyway? >> >> Phoebe >> >> >> > > > Zbyszek Otwinowski > UT Southwestern Medical Center at Dallas > 5323 Harry Hines Blvd. > Dallas, TX 75390-8816 > Tel. 214-645-6385 > Fax. 214-645-6353 > >
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
This is a lovely summary, and we should make our students read it. - But I'm afraid I do not see how it supports the closing statement in the last paragraph... phx. On 31/03/2011 17:06, Zbyszek Otwinowski wrote: The B-factor in crystallography represents the convolution (sum) of two types of uncertainties about the atom (electron cloud) position: 1) dispersion of atom positions in crystal lattice 2) uncertainty of the experimenter's knowledge about the atom position. In general, uncertainty needs not to be described by Gaussian function. However, communicating uncertainty using the second moment of its distribution is a widely accepted practice, with frequently implied meaning that it corresponds to a Gaussian probability function. B-factor is simply a scaled (by 8 times pi squared) second moment of uncertainty distribution. In the previous, long thread, confusion was generated by the additional assumption that B-factor also corresponds to a Gaussian probability distribution and not just to a second moment of any probability distribution. Crystallographic literature often implies the Gaussian shape, so there is some justification for such an interpretation, where the more complex probability distribution is represented by the sum of displaced Gaussians, where the area under each Gaussian component corresponds to the occupancy of an alternative conformation. For data with a typical resolution for macromolecular crystallography, such multi-Gaussian description of the atom position's uncertainty is not practical, as it would lead to instability in the refinement and/or overfitting. Due to this, a simplified description of the atom's position uncertainty by just the second moment of probability distribution is the right approach. For this reason, the PDB format is highly suitable for the description of positional uncertainties, the only difference with other fields being the unusual form of squaring and then scaling up the standard uncertainty. As this calculation can be easily inverted, there is no loss of information. However, in teaching one should probably stress more this unusual form of presenting the standard deviation. A separate issue is the use of restraints on B-factor values, a subject that probably needs a longer discussion. With respect to the previous thread, representing poorly-ordered (so called 'disordered') side chains by the most likely conformer with appropriately high B-factors is fully justifiable, and currently is probably the best solution to a difficult problem. Zbyszek Otwinowski - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. But this "knowledge" may be quite wrong. If the flaming red really indicates large vibrational motion then yes, one whould not bet on stable H-bonds. But if the flaming red indicates that a well-ordered sidechain was incorrectly modeled at full occupancy when in fact it is only present at half-occupancy then no, the H-bond could be strong but only present in that half-occupancy conformation. One presumes that the other half-occupancy location (perhaps missing from the model) would have its own H-bonding network. I beg to differ. If a side chain has 2 or more positions, one should be a bit careful about making firm conclusions based on only one of those, even if it isn't clear exactly why one should use caution. Also, isn't the isotropic B we fit at "medium" resolution more of a "spherical cow" approximation to physical reality anyway? Phoebe Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] what to do with disordered side chains
Dear Quyen, On Thu, Mar 31, 2011 at 11:27:58AM -0400, Quyen Hoang wrote: > Thank you for your post, Herman. > Since there is no holy bible to provide guidance, perhaps we should hold > off the idea of electing a "powerful dictator" to enforce this? > - at least until we all can come to a consensus on how the "dictator" > should dictate... > ... but that might well be even harder than to decide what to do with disordered side chains ... . With best wishes, Gerard. -- === * * * Gerard Bricogne g...@globalphasing.com * * * * Global Phasing Ltd. * * Sheraton House, Castle Park Tel: +44-(0)1223-353033 * * Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 * * * === > > > On Mar 31, 2011, at 10:22 AM, herman.schreu...@sanofi-aventis.com wrote: > >> Dear Quyen, >> I am afraid you won't get any better answers than you got so far. There is >> no holy bible telling you what to do with disordered side chains. I fully >> agree with James that you should try to get the best possible model, which >> best explains your data and that will be your decision. Here are my 2 >> cents: >> >> -If you see alternative positions, you have to build them. >> -If you do not see alternative positions, I would not replace one fantasy >> (some call it most likely) orientation with 2 or 3 fantasy orientations. >> -I personally belong to the "let the B-factors take care of it" camp, but >> that is my personal opinion. Leaving side chains out could lead to >> misinterpretations by slightly less savy users of our data, especially >> when charge distributions are being studied. Besides, we know (almost) for >> sure that the side chain is there, it is only disordered and as we just >> learned, even slightly less savy users know what flaming red side chains >> mean. Even if they may not be mathematically entirely correct, huge >> B-factors clearly indicate that there is disorder involved. >> -I would not let occupancies take up the slack since even very savy users >> have never heard of them and again, the side chain is fully occupied, only >> disordered. Of course if you build alternate positions, you have to divede >> the occupancies amongst them. >> >> Best, >> Herman >> >> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of >> Quyen Hoang >> Sent: Thursday, March 31, 2011 3:55 PM >> To: CCP4BB@JISCMAIL.AC.UK >> Subject: Re: [ccp4bb] what to do with disordered side chains >> >> We are getting off topic a little bit. >> >> Original topic: is it better to not build disordered sidechains or build >> them and let B-factors take care of it? >> Ed's poll got almost a 50:50 split. >> Question still unanswered. >> >> Second topic introduced by Pavel: "Your B-factors are valid within a >> harmonic (small) approximation of atomic vibrations. Larger scale motions >> you are talking about go beyond the harmonic approximation, and using the >> B-factor to model them is abusing the corresponding mathematical model." >> And that these large scale motions (disorders) are better represented by >> "alternative conformations and associated with them occupancies". >> >> My question is, how many people here do this? >> If you're currently doing what Pavel suggested here, how do you decide >> where to keep the upper limit of B-factors and what the occupancies are >> for each atom (data with resolution of 2.0A or worse)? I mean, do you cap >> the B-factor at a reasonable number to represent natural atomic vibrations >> (which is very small as Pavel pointed out) and then let the occupancies >> pick up the slack? More importantly, what is your reason for doing this? >> >> Cheers and thanks for your contribution, >> Quyen >> >> >> On Mar 30, 2011, at 5:20 PM, Pavel Afonine wrote: >> >>> Mark, >>> alternative conformations and associated with them occupancies are to >>> describe the larger scale disorder (the one that goes beyond the >>> B-factor's capability to cope with). >>> Multi-model PDB files is another option. >>> Best, >>> Pavel. >>> >>
Re: [ccp4bb] what to do with disordered side chains
Regarding suggestions that the pdb or the IUCR to tell us what to do: IMO - Neither of the usual solutions - (a) deleting side chains when there is no density or (b) letting B factors go where they will - is without problems (this is clear from the ongoing discussion). I would be really unhappy if some authority unilaterally imposed either of these solutions on the protein crystallographic community. Sue Dr. Sue A. Roberts Dept. of Chemistry and Biochemistry University of Arizona 1041 E. Lowell St., Tucson, AZ 85721 Phone: 520 621 8171 s...@email.arizona.edu http://www.biochem.arizona.edu/xray
[ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
The B-factor in crystallography represents the convolution (sum) of two types of uncertainties about the atom (electron cloud) position: 1) dispersion of atom positions in crystal lattice 2) uncertainty of the experimenter's knowledge about the atom position. In general, uncertainty needs not to be described by Gaussian function. However, communicating uncertainty using the second moment of its distribution is a widely accepted practice, with frequently implied meaning that it corresponds to a Gaussian probability function. B-factor is simply a scaled (by 8 times pi squared) second moment of uncertainty distribution. In the previous, long thread, confusion was generated by the additional assumption that B-factor also corresponds to a Gaussian probability distribution and not just to a second moment of any probability distribution. Crystallographic literature often implies the Gaussian shape, so there is some justification for such an interpretation, where the more complex probability distribution is represented by the sum of displaced Gaussians, where the area under each Gaussian component corresponds to the occupancy of an alternative conformation. For data with a typical resolution for macromolecular crystallography, such multi-Gaussian description of the atom position's uncertainty is not practical, as it would lead to instability in the refinement and/or overfitting. Due to this, a simplified description of the atom's position uncertainty by just the second moment of probability distribution is the right approach. For this reason, the PDB format is highly suitable for the description of positional uncertainties, the only difference with other fields being the unusual form of squaring and then scaling up the standard uncertainty. As this calculation can be easily inverted, there is no loss of information. However, in teaching one should probably stress more this unusual form of presenting the standard deviation. A separate issue is the use of restraints on B-factor values, a subject that probably needs a longer discussion. With respect to the previous thread, representing poorly-ordered (so called 'disordered') side chains by the most likely conformer with appropriately high B-factors is fully justifiable, and currently is probably the best solution to a difficult problem. Zbyszek Otwinowski >>> - they all know what B is and how to look for regions of high B >>> (with, say, pymol) and they know not to make firm conclusions about >>> H-bonds >>> to flaming red side chains. >> >>But this "knowledge" may be quite wrong. If the flaming red really >> indicates >>large vibrational motion then yes, one whould not bet on stable H-bonds. >>But if the flaming red indicates that a well-ordered sidechain was >> incorrectly >>modeled at full occupancy when in fact it is only present at >> half-occupancy >>then no, the H-bond could be strong but only present in that >> half-occupancy >>conformation. One presumes that the other half-occupancy location >> (perhaps >>missing from the model) would have its own H-bonding network. >> > > I beg to differ. If a side chain has 2 or more positions, one should be a > bit careful about making firm conclusions based on only one of those, even > if it isn't clear exactly why one should use caution. Also, isn't the > isotropic B we fit at "medium" resolution more of a "spherical cow" > approximation to physical reality anyway? > > Phoebe > > > Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] what to do with disordered side chains
On Thu, 2011-03-31 at 11:27 -0400, Quyen Hoang wrote: > Thank you for your post, Herman. > Since there is no holy bible to provide guidance, perhaps we should > hold off the idea of electing a "powerful dictator" to enforce this? > - at least until we all can come to a consensus on how the "dictator" > should dictate... Well, that is partly what we have the IUCr for, isn't it? A couple of people have referred to the wwPDB in this context, but IMHO the IUCr is a much better forum to try to reach some kind of decision about these issues. If the IUCr has a clear policy, the wwPDB can enforce it (like they did with the deposition of structure factors). If the wwPDB takes a lead a lot of people will get annoyed at them, when the real problem is that crystallographic practitioners haven't come to any agreement amongst themselves. And yes, I know that balancing questions of scientific correctness with the needs of more or less naive consumers of the data isn't straightforward :-) Regards, Peter. -- Peter Keller Tel.: +44 (0)1223 353033 Global Phasing Ltd., Fax.: +44 (0)1223 366889 Sheraton House, Castle Park, Cambridge CB3 0AX United Kingdom
Re: [ccp4bb] what to do with disordered side chains
On Thu, 2011-03-31 at 17:04 +0200, herman.schreu...@sanofi-aventis.com wrote: > Maybe we should really start using cif files, which allow to specify > coordinate uncertainties. PDB has SIGATM record for that purpose -- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs
Re: [ccp4bb] what to do with disordered side chains
Thank you for your post, Herman. Since there is no holy bible to provide guidance, perhaps we should hold off the idea of electing a "powerful dictator" to enforce this? - at least until we all can come to a consensus on how the "dictator" should dictate... Cheers, Quyen On Mar 31, 2011, at 10:22 AM, herman.schreu...@sanofi-aventis.com wrote: Dear Quyen, I am afraid you won't get any better answers than you got so far. There is no holy bible telling you what to do with disordered side chains. I fully agree with James that you should try to get the best possible model, which best explains your data and that will be your decision. Here are my 2 cents: -If you see alternative positions, you have to build them. -If you do not see alternative positions, I would not replace one fantasy (some call it most likely) orientation with 2 or 3 fantasy orientations. -I personally belong to the "let the B-factors take care of it" camp, but that is my personal opinion. Leaving side chains out could lead to misinterpretations by slightly less savy users of our data, especially when charge distributions are being studied. Besides, we know (almost) for sure that the side chain is there, it is only disordered and as we just learned, even slightly less savy users know what flaming red side chains mean. Even if they may not be mathematically entirely correct, huge B-factors clearly indicate that there is disorder involved. -I would not let occupancies take up the slack since even very savy users have never heard of them and again, the side chain is fully occupied, only disordered. Of course if you build alternate positions, you have to divede the occupancies amongst them. Best, Herman From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Quyen Hoang Sent: Thursday, March 31, 2011 3:55 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] what to do with disordered side chains We are getting off topic a little bit. Original topic: is it better to not build disordered sidechains or build them and let B-factors take care of it? Ed's poll got almost a 50:50 split. Question still unanswered. Second topic introduced by Pavel: "Your B-factors are valid within a harmonic (small) approximation of atomic vibrations. Larger scale motions you are talking about go beyond the harmonic approximation, and using the B-factor to model them is abusing the corresponding mathematical model." And that these large scale motions (disorders) are better represented by "alternative conformations and associated with them occupancies". My question is, how many people here do this? If you're currently doing what Pavel suggested here, how do you decide where to keep the upper limit of B-factors and what the occupancies are for each atom (data with resolution of 2.0A or worse)? I mean, do you cap the B-factor at a reasonable number to represent natural atomic vibrations (which is very small as Pavel pointed out) and then let the occupancies pick up the slack? More importantly, what is your reason for doing this? Cheers and thanks for your contribution, Quyen On Mar 30, 2011, at 5:20 PM, Pavel Afonine wrote: Mark, alternative conformations and associated with them occupancies are to describe the larger scale disorder (the one that goes beyond the B-factor's capability to cope with). Multi-model PDB files is another option. Best, Pavel. On Wed, Mar 30, 2011 at 2:15 PM, VAN RAAIJ , MARK JOHAN > wrote: yet, apart from (and additionally to) modelling two conformations of the side-chain, the B-factor is the only tool we have (now). Quoting Pavel Afonine: > Hi Quyen, > > > (...) And if B-factor is an estimate of thermo-motion (or static disorder), >> then would it not be reasonable to accept that building the side- chain and >> let B-factor sky rocket might reflect reality more so than not building it? >> > > NO. Your B-factors are valid within a harmonic (small) approximation of > atomic vibrations. Larger scale motions you are talking about go beyond the > harmonic approximation, and using the B-factor to model them is abusing the > corresponding mathematical model. > http://www.phenix-online.org/newsletter/CCN_2010_07.pdf > > Pavel. > Mark J van Raaij Laboratorio M-4 Dpto de Estructura de Macromoléculas Centro Nacional de Biotecnología - CSIC c/Darwin 3, Campus Cantoblanco 28049 Madrid tel. 91 585 4616 email: mjvanra...@cnb.csic.es
Re: [ccp4bb] what to do with disordered side chains
Dear Jacob, What do we gain? As Dale pointed out, we are already abusing either occupancy, B-factor or delete the side chain to compensate for our inability to tell the user that the side chain is disordered. With your proposal, we would fudge both occupancy and B-factor, which in my eyes is even worse as fudging just one of the two. Also, who should decide on the magic number: the all-knowing gurus at the protein data bank? Maybe we should really start using cif files, which allow to specify coordinate uncertainties. Best regards, Herman -Original Message- From: Jacob Keller [mailto:j-kell...@fsm.northwestern.edu] Sent: Thursday, March 31, 2011 4:43 PM To: Schreuder, Herman R&D/DE Cc: CCP4BB@jiscmail.ac.uk Subject: Re: [ccp4bb] what to do with disordered side chains Why not have the "b-factors take care of it" until some magic cutoff number? When they reach the cutoff, two things happen: 1. Occupancies are set to zero for those side chains, to represent our lack of ability to model the region, 2. B-factors are set to exactly 500, as a "flag" allowing casual b-factor-savvy users to identify suspicious regions, since they will probably not see occupancies, but *will* see b-factors. Therefore, all 0-occupancy atoms will automatically have b-factors = 500. I believe it is true that if the occupancies are zero, the b-factors are totally irrelevant for all calculations? Doesn't this satisfy both parties? Jacob On Thu, Mar 31, 2011 at 9:22 AM, wrote: > Dear Quyen, > I am afraid you won't get any better answers than you got so far. > There is no holy bible telling you what to do with disordered side > chains. I fully agree with James that you should try to get the best > possible model, which best explains your data and that will be your decision. > Here are my 2 cents: > > -If you see alternative positions, you have to build them. > -If you do not see alternative positions, I would not replace one > fantasy (some call it most likely) orientation with 2 or 3 fantasy > orientations. > -I personally belong to the "let the B-factors take care of it" camp, > but that is my personal opinion. Leaving side chains out could lead to > misinterpretations by slightly less savy users of our data, especially > when charge distributions are being studied. Besides, we know (almost) > for sure that the side chain is there, it is only disordered and as we > just learned, even slightly less savy users know what flaming red side > chains mean. Even if they may not be mathematically entirely correct, > huge B-factors clearly indicate that there is disorder involved. > -I would not let occupancies take up the slack since even very savy > users have never heard of them and again, the side chain is fully > occupied, only disordered. Of course if you build alternate positions, > you have to divede the occupancies amongst them. > > Best, > Herman > > > From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of > Quyen Hoang > Sent: Thursday, March 31, 2011 3:55 PM > To: CCP4BB@JISCMAIL.AC.UK > Subject: Re: [ccp4bb] what to do with disordered side chains > > We are getting off topic a little bit. > Original topic: is it better to not build disordered sidechains or > build them and let B-factors take care of it? > Ed's poll got almost a 50:50 split. > Question still unanswered. > Second topic introduced by Pavel: "Your B-factors are valid within a > harmonic (small) approximation of atomic vibrations. Larger scale > motions you are talking about go beyond the harmonic approximation, > and using the B-factor to model them is abusing the corresponding > mathematical model." > And that these large scale motions (disorders) are better represented > by "alternative conformations and associated with them occupancies". > My question is, how many people here do this? > If you're currently doing what Pavel suggested here, how do you decide > where to keep the upper limit of B-factors and what the occupancies > are for each atom (data with resolution of 2.0A or worse)? I mean, do > you cap the B-factor at a reasonable number to represent natural > atomic vibrations (which is very small as Pavel pointed out) and then > let the occupancies pick up the slack? More importantly, what is your reason > for doing this? > Cheers and thanks for your contribution, Quyen > > On Mar 30, 2011, at 5:20 PM, Pavel Afonine wrote: > > Mark, > alternative conformations and associated with them occupancies are to > describe the larger scale disorder (the one that goes beyond the > B-factor's capability to cope with). > Multi-model PDB files is another option. > Best, > Pavel. > >
Re: [ccp4bb] what to do with disordered side chains
Well, I guess I was thinking to make the b-factor such a preposterous value that no one would possibly believe it. Setting occupancies to zero effectively places a stumbling block, because people see the residues and think they are actually supported by data. So to counter-balance this, I thought putting up a high-b-factor flag would prevent people from tripping over the stumbling block. Look, you could even set the b-factor to 1 if you want--just something so people totally discount those coordinates. Jacob On Thu, Mar 31, 2011 at 9:55 AM, Nat Echols wrote: > On Thu, Mar 31, 2011 at 7:42 AM, Jacob Keller > wrote: >> >> Why not have the "b-factors take care of it" until some magic cutoff >> number? When they reach the cutoff, two things happen: >> >> 1. Occupancies are set to zero for those side chains, to represent our >> lack of ability to model the region, >> >> 2. B-factors are set to exactly 500, as a "flag" allowing casual >> b-factor-savvy users to identify suspicious regions, since they will >> probably not see occupancies, but *will* see b-factors. Therefore, all >> 0-occupancy atoms will automatically have b-factors = 500. I believe >> it is true that if the occupancies are zero, the b-factors are totally >> irrelevant for all calculations? >> >> Doesn't this satisfy both parties? > > No, because now you're not only presenting the user with made-up > coordinates, you're giving them a made-up B-factor as well, so there is > effectively no property of those atoms that is based on experimental data > rather than subjective criteria. Regardless of any problems inherent in > letting the B-factors take care of all forms of disorder, they are > nonetheless a refined parameter. > -Nat -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program cel: 773.608.9185 email: j-kell...@northwestern.edu ***
Re: [ccp4bb] what to do with disordered side chains
>> - they all know what B is and how to look for regions of high B >> (with, say, pymol) and they know not to make firm conclusions about H-bonds >> to flaming red side chains. > >But this "knowledge" may be quite wrong. If the flaming red really indicates >large vibrational motion then yes, one whould not bet on stable H-bonds. >But if the flaming red indicates that a well-ordered sidechain was incorrectly >modeled at full occupancy when in fact it is only present at half-occupancy >then no, the H-bond could be strong but only present in that half-occupancy >conformation. One presumes that the other half-occupancy location (perhaps >missing from the model) would have its own H-bonding network. > I beg to differ. If a side chain has 2 or more positions, one should be a bit careful about making firm conclusions based on only one of those, even if it isn't clear exactly why one should use caution. Also, isn't the isotropic B we fit at "medium" resolution more of a "spherical cow" approximation to physical reality anyway? Phoebe
Re: [ccp4bb] what to do with disordered side chains
On Thu, Mar 31, 2011 at 7:42 AM, Jacob Keller < j-kell...@fsm.northwestern.edu> wrote: > Why not have the "b-factors take care of it" until some magic cutoff > number? When they reach the cutoff, two things happen: > > 1. Occupancies are set to zero for those side chains, to represent our > lack of ability to model the region, > > 2. B-factors are set to exactly 500, as a "flag" allowing casual > b-factor-savvy users to identify suspicious regions, since they will > probably not see occupancies, but *will* see b-factors. Therefore, all > 0-occupancy atoms will automatically have b-factors = 500. I believe > it is true that if the occupancies are zero, the b-factors are totally > irrelevant for all calculations? > > Doesn't this satisfy both parties? No, because now you're not only presenting the user with made-up coordinates, you're giving them a made-up B-factor as well, so there is effectively no property of those atoms that is based on experimental data rather than subjective criteria. Regardless of any problems inherent in letting the B-factors take care of all forms of disorder, they are nonetheless a refined parameter. -Nat
Re: [ccp4bb] what to do with disordered side chains
Why not have the "b-factors take care of it" until some magic cutoff number? When they reach the cutoff, two things happen: 1. Occupancies are set to zero for those side chains, to represent our lack of ability to model the region, 2. B-factors are set to exactly 500, as a "flag" allowing casual b-factor-savvy users to identify suspicious regions, since they will probably not see occupancies, but *will* see b-factors. Therefore, all 0-occupancy atoms will automatically have b-factors = 500. I believe it is true that if the occupancies are zero, the b-factors are totally irrelevant for all calculations? Doesn't this satisfy both parties? Jacob On Thu, Mar 31, 2011 at 9:22 AM, wrote: > Dear Quyen, > I am afraid you won't get any better answers than you got so far. There is > no holy bible telling you what to do with disordered side chains. I fully > agree with James that you should try to get the best possible model, which > best explains your data and that will be your decision. Here are my 2 cents: > > -If you see alternative positions, you have to build them. > -If you do not see alternative positions, I would not replace one fantasy > (some call it most likely) orientation with 2 or 3 fantasy orientations. > -I personally belong to the "let the B-factors take care of it" camp, but > that is my personal opinion. Leaving side chains out could lead to > misinterpretations by slightly less savy users of our data, especially when > charge distributions are being studied. Besides, we know (almost) for sure > that the side chain is there, it is only disordered and as we just learned, > even slightly less savy users know what flaming red side chains mean. Even > if they may not be mathematically entirely correct, huge B-factors clearly > indicate that there is disorder involved. > -I would not let occupancies take up the slack since even very savy users > have never heard of them and again, the side chain is fully occupied, only > disordered. Of course if you build alternate positions, you have to divede > the occupancies amongst them. > > Best, > Herman > > > From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Quyen > Hoang > Sent: Thursday, March 31, 2011 3:55 PM > To: CCP4BB@JISCMAIL.AC.UK > Subject: Re: [ccp4bb] what to do with disordered side chains > > We are getting off topic a little bit. > Original topic: is it better to not build disordered sidechains or build > them and let B-factors take care of it? > Ed's poll got almost a 50:50 split. > Question still unanswered. > Second topic introduced by Pavel: "Your B-factors are valid within a > harmonic (small) approximation of atomic vibrations. Larger scale motions > you are talking about go beyond the harmonic approximation, and using the > B-factor to model them is abusing the corresponding mathematical model." > And that these large scale motions (disorders) are better represented by > "alternative conformations and associated with them occupancies". > My question is, how many people here do this? > If you're currently doing what Pavel suggested here, how do you decide where > to keep the upper limit of B-factors and what the occupancies are for each > atom (data with resolution of 2.0A or worse)? I mean, do you cap the > B-factor at a reasonable number to represent natural atomic vibrations > (which is very small as Pavel pointed out) and then let the occupancies pick > up the slack? More importantly, what is your reason for doing this? > Cheers and thanks for your contribution, > Quyen > > On Mar 30, 2011, at 5:20 PM, Pavel Afonine wrote: > > Mark, > alternative conformations and associated with them occupancies are to > describe the larger scale disorder (the one that goes beyond the B-factor's > capability to cope with). > Multi-model PDB files is another option. > Best, > Pavel. > > > On Wed, Mar 30, 2011 at 2:15 PM, VAN RAAIJ , MARK JOHAN > wrote: >> >> yet, apart from (and additionally to) modelling two conformations of the >> side-chain, the B-factor is the only tool we have (now). >> Quoting Pavel Afonine: >> >> > Hi Quyen, >> > >> > >> > (...) And if B-factor is an estimate of thermo-motion (or static >> > disorder), >> >> then would it not be reasonable to accept that building the side-chain >> >> and >> >> let B-factor sky rocket might reflect reality more so than not building >> >> it? >> >> >> > >> > NO. Your B-factors are valid within a harmonic (small) approximation of >> > atomic vibrations. Larger scale motions you are talking about go b
Re: [ccp4bb] what to do with disordered side chains
Dear Quyen, I am afraid you won't get any better answers than you got so far. There is no holy bible telling you what to do with disordered side chains. I fully agree with James that you should try to get the best possible model, which best explains your data and that will be your decision. Here are my 2 cents: -If you see alternative positions, you have to build them. -If you do not see alternative positions, I would not replace one fantasy (some call it most likely) orientation with 2 or 3 fantasy orientations. -I personally belong to the "let the B-factors take care of it" camp, but that is my personal opinion. Leaving side chains out could lead to misinterpretations by slightly less savy users of our data, especially when charge distributions are being studied. Besides, we know (almost) for sure that the side chain is there, it is only disordered and as we just learned, even slightly less savy users know what flaming red side chains mean. Even if they may not be mathematically entirely correct, huge B-factors clearly indicate that there is disorder involved. -I would not let occupancies take up the slack since even very savy users have never heard of them and again, the side chain is fully occupied, only disordered. Of course if you build alternate positions, you have to divede the occupancies amongst them. Best, Herman From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Quyen Hoang Sent: Thursday, March 31, 2011 3:55 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] what to do with disordered side chains We are getting off topic a little bit. Original topic: is it better to not build disordered sidechains or build them and let B-factors take care of it? Ed's poll got almost a 50:50 split. Question still unanswered. Second topic introduced by Pavel: "Your B-factors are valid within a harmonic (small) approximation of atomic vibrations. Larger scale motions you are talking about go beyond the harmonic approximation, and using the B-factor to model them is abusing the corresponding mathematical model." And that these large scale motions (disorders) are better represented by "alternative conformations and associated with them occupancies". My question is, how many people here do this? If you're currently doing what Pavel suggested here, how do you decide where to keep the upper limit of B-factors and what the occupancies are for each atom (data with resolution of 2.0A or worse)? I mean, do you cap the B-factor at a reasonable number to represent natural atomic vibrations (which is very small as Pavel pointed out) and then let the occupancies pick up the slack? More importantly, what is your reason for doing this? Cheers and thanks for your contribution, Quyen On Mar 30, 2011, at 5:20 PM, Pavel Afonine wrote: Mark, alternative conformations and associated with them occupancies are to describe the larger scale disorder (the one that goes beyond the B-factor's capability to cope with). Multi-model PDB files is another option. Best, Pavel. On Wed, Mar 30, 2011 at 2:15 PM, VAN RAAIJ , MARK JOHAN wrote: yet, apart from (and additionally to) modelling two conformations of the side-chain, the B-factor is the only tool we have (now). Quoting Pavel Afonine: > Hi Quyen, > > > (...) And if B-factor is an estimate of thermo-motion (or static disorder), >> then would it not be reasonable to accept that building the side-chain and >> let B-factor sky rocket might reflect reality more so than not building it? >> > > NO. Your B-factors are valid within a harmonic (small) approximation of > atomic vibrations. Larger scale motions you are talking about go beyond the > harmonic approximation, and using the B-factor to model them is abusing the > corresponding mathematical model. > http://www.phenix-online.org/newsletter/CCN_2010_07.pdf > > Pavel. > Mark J van Raaij Laboratorio M-4 Dpto de Estructura de Macromoléculas
Re: [ccp4bb] what to do with disordered side chains
We are getting off topic a little bit. Original topic: is it better to not build disordered sidechains or build them and let B-factors take care of it? Ed's poll got almost a 50:50 split. Question still unanswered. Second topic introduced by Pavel: "Your B-factors are valid within a harmonic (small) approximation of atomic vibrations. Larger scale motions you are talking about go beyond the harmonic approximation, and using the B-factor to model them is abusing the corresponding mathematical model." And that these large scale motions (disorders) are better represented by "alternative conformations and associated with them occupancies". My question is, how many people here do this? If you're currently doing what Pavel suggested here, how do you decide where to keep the upper limit of B-factors and what the occupancies are for each atom (data with resolution of 2.0A or worse)? I mean, do you cap the B-factor at a reasonable number to represent natural atomic vibrations (which is very small as Pavel pointed out) and then let the occupancies pick up the slack? More importantly, what is your reason for doing this? Cheers and thanks for your contribution, Quyen On Mar 30, 2011, at 5:20 PM, Pavel Afonine wrote: Mark, alternative conformations and associated with them occupancies are to describe the larger scale disorder (the one that goes beyond the B-factor's capability to cope with). Multi-model PDB files is another option. Best, Pavel. On Wed, Mar 30, 2011 at 2:15 PM, VAN RAAIJ , MARK JOHAN > wrote: yet, apart from (and additionally to) modelling two conformations of the side-chain, the B-factor is the only tool we have (now). Quoting Pavel Afonine: > Hi Quyen, > > > (...) And if B-factor is an estimate of thermo-motion (or static disorder), >> then would it not be reasonable to accept that building the side- chain and >> let B-factor sky rocket might reflect reality more so than not building it? >> > > NO. Your B-factors are valid within a harmonic (small) approximation of > atomic vibrations. Larger scale motions you are talking about go beyond the > harmonic approximation, and using the B-factor to model them is abusing the > corresponding mathematical model. > http://www.phenix-online.org/newsletter/CCN_2010_07.pdf > > Pavel. > Mark J van Raaij Laboratorio M-4 Dpto de Estructura de Macromoléculas Centro Nacional de Biotecnología - CSIC c/Darwin 3, Campus Cantoblanco 28049 Madrid tel. 91 585 4616 email: mjvanra...@cnb.csic.es
Re: [ccp4bb] what to do with disordered side chains
Mark, alternative conformations and associated with them occupancies are to describe the larger scale disorder (the one that goes beyond the B-factor's capability to cope with). Multi-model PDB files is another option. Best, Pavel. On Wed, Mar 30, 2011 at 2:15 PM, VAN RAAIJ , MARK JOHAN < mjvanra...@cnb.csic.es> wrote: > yet, apart from (and additionally to) modelling two conformations of the > side-chain, the B-factor is the only tool we have (now). > > Quoting Pavel Afonine: > > > Hi Quyen, > > > > > > (...) And if B-factor is an estimate of thermo-motion (or static > disorder), > >> then would it not be reasonable to accept that building the side-chain > and > >> let B-factor sky rocket might reflect reality more so than not building > it? > >> > > > > NO. Your B-factors are valid within a harmonic (small) approximation of > > atomic vibrations. Larger scale motions you are talking about go beyond > the > > harmonic approximation, and using the B-factor to model them is abusing > the > > corresponding mathematical model. > > http://www.phenix-online.org/newsletter/CCN_2010_07.pdf > > > > Pavel. > > > > Mark J van Raaij > Laboratorio M-4 > Dpto de Estructura de Macromoléculas > Centro Nacional de Biotecnología - CSIC > > c/Darwin 3, Campus Cantoblanco > 28049 Madrid > tel. 91 585 4616 > email: mjvanra...@cnb.csic.es > >
Re: [ccp4bb] what to do with disordered side chains
yet, apart from (and additionally to) modelling two conformations of the side-chain, the B-factor is the only tool we have (now). Quoting Pavel Afonine: > Hi Quyen, > > > (...) And if B-factor is an estimate of thermo-motion (or static disorder), >> then would it not be reasonable to accept that building the side-chain and >> let B-factor sky rocket might reflect reality more so than not building it? >> > > NO. Your B-factors are valid within a harmonic (small) approximation of > atomic vibrations. Larger scale motions you are talking about go beyond the > harmonic approximation, and using the B-factor to model them is abusing the > corresponding mathematical model. > http://www.phenix-online.org/newsletter/CCN_2010_07.pdf > > Pavel. > Mark J van Raaij Laboratorio M-4 Dpto de Estructura de Macromoléculas Centro Nacional de Biotecnología - CSIC c/Darwin 3, Campus Cantoblanco 28049 Madrid tel. 91 585 4616 email: mjvanra...@cnb.csic.es
Re: [ccp4bb] what to do with disordered side chains
On Wednesday, March 30, 2011 11:04:30 am James Holton wrote: > perhaps a better name for the "disordered side chain problem" would be > "dark density"? This name would place it properly amongst "dark > matter", "dark energy" and other fudge factors introduced to try and > explain why our "standard model" is not consistent with observation? Funny you should mention that. I have a partial answer to the problem - Stay Tuned! Ethan -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] what to do with disordered side chains
Hi Quyen, (...) And if B-factor is an estimate of thermo-motion (or static disorder), > then would it not be reasonable to accept that building the side-chain and > let B-factor sky rocket might reflect reality more so than not building it? > NO. Your B-factors are valid within a harmonic (small) approximation of atomic vibrations. Larger scale motions you are talking about go beyond the harmonic approximation, and using the B-factor to model them is abusing the corresponding mathematical model. http://www.phenix-online.org/newsletter/CCN_2010_07.pdf Pavel.
Re: [ccp4bb] what to do with disordered side chains
in the past, >>> and maybe it's time for a powerful dictator at the PDB to create the law... >>> >>> Filip Van Petegem >>> >>> >>> >>> On Wed, Mar 30, 2011 at 8:37 AM, Mark J van Raaij >>> wrote: >>> perhaps the IUCr and/or PDB (Gerard K?) should issue some guidelines along >>> these lines? >>> And oblige us all to follow them? >>> Mark J van Raaij >>> Laboratorio M-4 >>> Dpto de Estructura de Macromoleculas >>> Centro Nacional de Biotecnologia - CSIC >>> c/Darwin 3, Campus Cantoblanco >>> E-28049 Madrid, Spain >>> tel. (+34) 91 585 4616 >>> http://www.cnb.csic.es/content/research/macromolecular/mvraaij/index.php?l=1 >>> >>> >>> >>> On 30 Mar 2011, at 17:29, Phoebe Rice wrote: >>> >>> > I've now polled 4 fairly savvy "end users" of crystal structures and >>> > there seems to be a consensus: >>> > >>> > - they all know what B is and how to look for regions of high B (with, >>> > say, pymol) and they know not to make firm conclusions about H-bonds to >>> > flaming red side chains. >>> > - None of them would ever think to look at occupancy and they don't know >>> > how anyway. >>> > - they expect that loops with disordered backbones would not be included >>> > in the models, and can figure out truncated or fake-ala side chains with >>> > some additioanl effort, but that option makes viewing surfaces and >>> > e-stats more of a pain. >>> > >>> > Phoebe >>> > >>> > = >>> > Phoebe A. Rice >>> > Dept. of Biochemistry & Molecular Biology >>> > The University of Chicago >>> > phone 773 834 1723 >>> > http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 >>> > http://www.rsc.org/shop/books/2008/9780854042722.asp >>> > >>> > >>> > Original message >>> >> Date: Tue, 29 Mar 2011 17:43:49 -0400 >>> >> From: CCP4 bulletin board (on behalf of Ed >>> >> Pozharski ) >>> >> Subject: [ccp4bb] what to do with disordered side chains >>> >> To: CCP4BB@JISCMAIL.AC.UK >>> >> >>> >> The results of the online survey on what to do with disordered side >>> >> chains (from total of 240 responses): >>> >> >>> >> Delete the atoms 43% >>> >> Let refinement take care of it by inflating B-factors41% >>> >> Set occupancy to zero12% >>> >> Other 4% >>> >> >>> >> "Other" suggestions were: >>> >> >>> >> - Place atoms in most likely spot based on rotomer and contacts and >>> >> indicate high positional sigmas on ATMSIG records >>> >> - To invent refinement that will spread this residues over many rotamers >>> >> as this is what actually happened >>> >> - Delet the atoms but retain the original amino acid name >>> >> - choose the most common rotamer (B-factors don't "inflate", they just >>> >> rise slightly) >>> >> - Depends. if the disordered region is unteresting, delete atoms. >>> >> Otherwise, try to model it in one or more disordered model (and then >>> >> state it clearly in the pdb file) >>> >> - In case that no density is in the map, model several conformations of >>> >> the missing segment and insert it into the PDB file with zero >>> >> occupancies. It is equivalent what the NMR people do. >>> >> - Model it in and compare the MD simulations with SAXS >>> >> - I would assumne Dale Tronrod suggestion the best. Sigatm labels. >>> >> - Let the refinement inflate B-factors, then set occupancy to zero in >>> >> the last round. >>> >> >>> >> Thanks to all for participation, >>> >> >>> >> Ed. >>> >> >>> >> -- >>> >> "I'd jump in myself, if I weren't so good at whistling." >>> >> Julian, King of Lemurs >>> >>> >>> >>> -- >>> Filip Van Petegem, PhD >>> Assistant Professor >>> The University of British Columbia >>> Dept. of Biochemistry and Molecular Biology >>> 2350 Health Sciences Mall - Rm 2.356 >>> Vancouver, V6T 1Z3 >>> >>> phone: +1 604 827 4267 >>> email: filip.vanpete...@gmail.com >>> http://crg.ubc.ca/VanPetegem/ >> >
Re: [ccp4bb] what to do with disordered side chains
>> there are no absolute guidelines simply because there isn't any >>> consensus among crystallographers... (from what we can gather from >>> this set of emails...). On the other hand, this discussion has flared >>> up many times in the past, and maybe it's time for a powerful >>> dictator at the PDB to create the law... >>> >>> Filip Van Petegem >>> >>> >>> >>> On Wed, Mar 30, 2011 at 8:37 AM, Mark J van Raaij >>> mailto:mjvanra...@cnb.csic.es>> wrote: >>> >>> perhaps the IUCr and/or PDB (Gerard K?) should issue some >>> guidelines along these lines? >>> And oblige us all to follow them? >>> Mark J van Raaij >>> Laboratorio M-4 >>> Dpto de Estructura de Macromoleculas >>> Centro Nacional de Biotecnologia - CSIC >>> c/Darwin 3, Campus Cantoblanco >>> E-28049 Madrid, Spain >>> tel. (+34) 91 585 4616 >>> >>> http://www.cnb.csic.es/content/research/macromolecular/mvraaij/index.php?l=1 >>> >>> >>> >>> On 30 Mar 2011, at 17:29, Phoebe Rice wrote: >>> >>> > I've now polled 4 fairly savvy "end users" of crystal >>> structures and there seems to be a consensus: >>> > >>> > - they all know what B is and how to look for regions of high B >>> (with, say, pymol) and they know not to make firm conclusions >>> about H-bonds to flaming red side chains. >>> > - None of them would ever think to look at occupancy and they >>> don't know how anyway. >>> > - they expect that loops with disordered backbones would not be >>> included in the models, and can figure out truncated or fake-ala >>> side chains with some additioanl effort, but that option makes >>> viewing surfaces and e-stats more of a pain. >>> > >>> > Phoebe >>> > >>> > = >>> > Phoebe A. Rice >>> > Dept. of Biochemistry & Molecular Biology >>> > The University of Chicago >>> > phone 773 834 1723 >>> > >>> >>> http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 >>> > http://www.rsc.org/shop/books/2008/9780854042722.asp >>> > >>> > >>> > Original message >>> >> Date: Tue, 29 Mar 2011 17:43:49 -0400 >>> >> From: CCP4 bulletin board >> <mailto:CCP4BB@JISCMAIL.AC.UK>> (on behalf of Ed Pozharski >>> mailto:epozh...@umaryland.edu>>) >>> >> Subject: [ccp4bb] what to do with disordered side chains >>> >> To: CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK> >>> >> >>> >> The results of the online survey on what to do with disordered >>> side >>> >> chains (from total of 240 responses): >>> >> >>> >> Delete the atoms 43% >>> >> Let refinement take care of it by inflating B-factors41% >>> >> Set occupancy to zero12% >>> >> Other 4% >>> >> >>> >> "Other" suggestions were: >>> >> >>> >> - Place atoms in most likely spot based on rotomer and >>> contacts and >>> >> indicate high positional sigmas on ATMSIG records >>> >> - To invent refinement that will spread this residues over >>> many rotamers >>> >> as this is what actually happened >>> >> - Delet the atoms but retain the original amino acid name >>> >> - choose the most common rotamer (B-factors don't "inflate", >>> they just >>> >> rise slightly) >>> >> - Depends. if the disordered region is unteresting, delete atoms. >>> >> Otherwise, try to model it in one or more disordered model >>> (and then >>> >> state it clearly in the pdb file) >>> >> - In case that no density is in the map, model several >>> conformations of >>> >> the missing segment and insert it into the PDB file with zero >>> >> occupancies. It is equivalent what the NMR people do. >>> >> - Model it in and compare the MD simulations with SAXS >>> >> - I would assumne Dale Tronrod suggestion the best. Sigatm labels. >>> >> - Let the refinement inflate B-factors, then set occupancy to >>> zero in >>> >> the last round. >>> >> >>> >> Thanks to all for participation, >>> >> >>> >> Ed. >>> >> >>> >> -- >>> >> "I'd jump in myself, if I weren't so good at whistling." >>> >> Julian, King of Lemurs >>> >>> >>> >>> >>> -- >>> Filip Van Petegem, PhD >>> Assistant Professor >>> The University of British Columbia >>> Dept. of Biochemistry and Molecular Biology >>> 2350 Health Sciences Mall - Rm 2.356 >>> Vancouver, V6T 1Z3 >>> >>> phone: +1 604 827 4267 >>> email: filip.vanpete...@gmail.com <mailto:filip.vanpete...@gmail.com> >>> http://crg.ubc.ca/VanPetegem/ >>
Re: [ccp4bb] what to do with disordered side chains
ancy and they don't know how anyway. > - they expect that loops with disordered backbones would not be included in the models, and can figure out truncated or fake-ala side chains with some additioanl effort, but that option makes viewing surfaces and e-stats more of a pain. > > Phoebe > > = > Phoebe A. Rice > Dept. of Biochemistry & Molecular Biology > The University of Chicago > phone 773 834 1723 > http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 > http://www.rsc.org/shop/books/2008/9780854042722.asp > > > Original message >> Date: Tue, 29 Mar 2011 17:43:49 -0400 >> From: CCP4 bulletin board (on behalf of Ed Pozharski ) >> Subject: [ccp4bb] what to do with disordered side chains >> To: CCP4BB@JISCMAIL.AC.UK >> >> The results of the online survey on what to do with disordered side >> chains (from total of 240 responses): >> >> Delete the atoms 43% >> Let refinement take care of it by inflating B-factors41% >> Set occupancy to zero12% >> Other 4% >> >> "Other" suggestions were: >> >> - Place atoms in most likely spot based on rotomer and contacts and >> indicate high positional sigmas on ATMSIG records >> - To invent refinement that will spread this residues over many rotamers >> as this is what actually happened >> - Delet the atoms but retain the original amino acid name >> - choose the most common rotamer (B-factors don't "inflate", they just >> rise slightly) >> - Depends. if the disordered region is unteresting, delete atoms. >> Otherwise, try to model it in one or more disordered model (and then >> state it clearly in the pdb file) >> - In case that no density is in the map, model several conformations of >> the missing segment and insert it into the PDB file with zero >> occupancies. It is equivalent what the NMR people do. >> - Model it in and compare the MD simulations with SAXS >> - I would assumne Dale Tronrod suggestion the best. Sigatm labels. >> - Let the refinement inflate B-factors, then set occupancy to zero in >> the last round. >> >> Thanks to all for participation, >> >> Ed. >> >> -- >> "I'd jump in myself, if I weren't so good at whistling." >> Julian, King of Lemurs -- Filip Van Petegem, PhD Assistant Professor The University of British Columbia Dept. of Biochemistry and Molecular Biology 2350 Health Sciences Mall - Rm 2.356 Vancouver, V6T 1Z3 phone: +1 604 827 4267 email: filip.vanpete...@gmail.com http://crg.ubc.ca/VanPetegem/
Re: [ccp4bb] what to do with disordered side chains
haps the IUCr and/or PDB (Gerard K?) should issue some guidelines along these lines? And oblige us all to follow them? Mark J van Raaij Laboratorio M-4 Dpto de Estructura de Macromoleculas Centro Nacional de Biotecnologia - CSIC c/Darwin 3, Campus Cantoblanco E-28049 Madrid, Spain tel. (+34) 91 585 4616 http://www.cnb.csic.es/content/research/macromolecular/mvraaij/index.php?l=1 On 30 Mar 2011, at 17:29, Phoebe Rice wrote: I've now polled 4 fairly savvy "end users" of crystal structures and there seems to be a consensus: - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. - None of them would ever think to look at occupancy and they don't know how anyway. - they expect that loops with disordered backbones would not be included in the models, and can figure out truncated or fake-ala side chains with some additioanl effort, but that option makes viewing surfaces and e-stats more of a pain. Phoebe = Phoebe A. Rice Dept. of Biochemistry& Molecular Biology The University of Chicago phone 773 834 1723 http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 http://www.rsc.org/shop/books/2008/9780854042722.asp Original message Date: Tue, 29 Mar 2011 17:43:49 -0400 From: CCP4 bulletin board (on behalf of Ed Pozharski) Subject: [ccp4bb] what to do with disordered side chains To: CCP4BB@JISCMAIL.AC.UK The results of the online survey on what to do with disordered side chains (from total of 240 responses): Delete the atoms 43% Let refinement take care of it by inflating B-factors41% Set occupancy to zero12% Other 4% "Other" suggestions were: - Place atoms in most likely spot based on rotomer and contacts and indicate high positional sigmas on ATMSIG records - To invent refinement that will spread this residues over many rotamers as this is what actually happened - Delet the atoms but retain the original amino acid name - choose the most common rotamer (B-factors don't "inflate", they just rise slightly) - Depends. if the disordered region is unteresting, delete atoms. Otherwise, try to model it in one or more disordered model (and then state it clearly in the pdb file) - In case that no density is in the map, model several conformations of the missing segment and insert it into the PDB file with zero occupancies. It is equivalent what the NMR people do. - Model it in and compare the MD simulations with SAXS - I would assumne Dale Tronrod suggestion the best. Sigatm labels. - Let the refinement inflate B-factors, then set occupancy to zero in the last round. Thanks to all for participation, Ed. -- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs -- Filip Van Petegem, PhD Assistant Professor The University of British Columbia Dept. of Biochemistry and Molecular Biology 2350 Health Sciences Mall - Rm 2.356 Vancouver, V6T 1Z3 phone: +1 604 827 4267 email: filip.vanpete...@gmail.com http://crg.ubc.ca/VanPetegem/
Re: [ccp4bb] what to do with disordered side chains
ons about H-bonds to flaming red side chains. > - None of them would ever think to look at occupancy and they don't know how anyway. > - they expect that loops with disordered backbones would not be included in the models, and can figure out truncated or fake-ala side chains with some additioanl effort, but that option makes viewing surfaces and e-stats more of a pain. > > Phoebe > > = > Phoebe A. Rice > Dept. of Biochemistry & Molecular Biology > The University of Chicago > phone 773 834 1723 > http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 > http://www.rsc.org/shop/books/2008/9780854042722.asp > > > Original message >> Date: Tue, 29 Mar 2011 17:43:49 -0400 >> From: CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> (on behalf of Ed Pozharski mailto:epozh...@umaryland.edu>>) >> Subject: [ccp4bb] what to do with disordered side chains >> To: CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK> >> >> The results of the online survey on what to do with disordered side >> chains (from total of 240 responses): >> >> Delete the atoms 43% >> Let refinement take care of it by inflating B-factors41% >> Set occupancy to zero12% >> Other 4% >> >> "Other" suggestions were: >> >> - Place atoms in most likely spot based on rotomer and contacts and >> indicate high positional sigmas on ATMSIG records >> - To invent refinement that will spread this residues over many rotamers >> as this is what actually happened >> - Delet the atoms but retain the original amino acid name >> - choose the most common rotamer (B-factors don't "inflate", they just >> rise slightly) >> - Depends. if the disordered region is unteresting, delete atoms. >> Otherwise, try to model it in one or more disordered model (and then >> state it clearly in the pdb file) >> - In case that no density is in the map, model several conformations of >> the missing segment and insert it into the PDB file with zero >> occupancies. It is equivalent what the NMR people do. >> - Model it in and compare the MD simulations with SAXS >> - I would assumne Dale Tronrod suggestion the best. Sigatm labels. >> - Let the refinement inflate B-factors, then set occupancy to zero in >> the last round. >> >> Thanks to all for participation, >> >> Ed. >> >> -- >> "I'd jump in myself, if I weren't so good at whistling." >> Julian, King of Lemurs -- Filip Van Petegem, PhD Assistant Professor The University of British Columbia Dept. of Biochemistry and Molecular Biology 2350 Health Sciences Mall - Rm 2.356 Vancouver, V6T 1Z3 phone: +1 604 827 4267 email: filip.vanpete...@gmail.com <mailto:filip.vanpete...@gmail.com> http://crg.ubc.ca/VanPetegem/
Re: [ccp4bb] what to do with disordered side chains
w... >> >> Filip Van Petegem >> >> >> >> On Wed, Mar 30, 2011 at 8:37 AM, Mark J van Raaij >> wrote: >> perhaps the IUCr and/or PDB (Gerard K?) should issue some guidelines along >> these lines? >> And oblige us all to follow them? >> Mark J van Raaij >> Laboratorio M-4 >> Dpto de Estructura de Macromoleculas >> Centro Nacional de Biotecnologia - CSIC >> c/Darwin 3, Campus Cantoblanco >> E-28049 Madrid, Spain >> tel. (+34) 91 585 4616 >> http://www.cnb.csic.es/content/research/macromolecular/mvraaij/index.php?l=1 >> >> >> >> On 30 Mar 2011, at 17:29, Phoebe Rice wrote: >> >> > I've now polled 4 fairly savvy "end users" of crystal structures and there >> > seems to be a consensus: >> > >> > - they all know what B is and how to look for regions of high B (with, >> > say, pymol) and they know not to make firm conclusions about H-bonds to >> > flaming red side chains. >> > - None of them would ever think to look at occupancy and they don't know >> > how anyway. >> > - they expect that loops with disordered backbones would not be included >> > in the models, and can figure out truncated or fake-ala side chains with >> > some additioanl effort, but that option makes viewing surfaces and e-stats >> > more of a pain. >> > >> > Phoebe >> > >> > = >> > Phoebe A. Rice >> > Dept. of Biochemistry & Molecular Biology >> > The University of Chicago >> > phone 773 834 1723 >> > http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 >> > http://www.rsc.org/shop/books/2008/9780854042722.asp >> > >> > >> > Original message >> >> Date: Tue, 29 Mar 2011 17:43:49 -0400 >> >> From: CCP4 bulletin board (on behalf of Ed >> >> Pozharski ) >> >> Subject: [ccp4bb] what to do with disordered side chains >> >> To: CCP4BB@JISCMAIL.AC.UK >> >> >> >> The results of the online survey on what to do with disordered side >> >> chains (from total of 240 responses): >> >> >> >> Delete the atoms 43% >> >> Let refinement take care of it by inflating B-factors41% >> >> Set occupancy to zero12% >> >> Other 4% >> >> >> >> "Other" suggestions were: >> >> >> >> - Place atoms in most likely spot based on rotomer and contacts and >> >> indicate high positional sigmas on ATMSIG records >> >> - To invent refinement that will spread this residues over many rotamers >> >> as this is what actually happened >> >> - Delet the atoms but retain the original amino acid name >> >> - choose the most common rotamer (B-factors don't "inflate", they just >> >> rise slightly) >> >> - Depends. if the disordered region is unteresting, delete atoms. >> >> Otherwise, try to model it in one or more disordered model (and then >> >> state it clearly in the pdb file) >> >> - In case that no density is in the map, model several conformations of >> >> the missing segment and insert it into the PDB file with zero >> >> occupancies. It is equivalent what the NMR people do. >> >> - Model it in and compare the MD simulations with SAXS >> >> - I would assumne Dale Tronrod suggestion the best. Sigatm labels. >> >> - Let the refinement inflate B-factors, then set occupancy to zero in >> >> the last round. >> >> >> >> Thanks to all for participation, >> >> >> >> Ed. >> >> >> >> -- >> >> "I'd jump in myself, if I weren't so good at whistling." >> >> Julian, King of Lemurs >> >> >> >> -- >> Filip Van Petegem, PhD >> Assistant Professor >> The University of British Columbia >> Dept. of Biochemistry and Molecular Biology >> 2350 Health Sciences Mall - Rm 2.356 >> Vancouver, V6T 1Z3 >> >> phone: +1 604 827 4267 >> email: filip.vanpete...@gmail.com >> http://crg.ubc.ca/VanPetegem/ >
Re: [ccp4bb] what to do with disordered side chains
I'm afraid this is not a problem that can be solved by "standardization". Fundamentally, if you are a scientist who has collected some data (be it diffraction spot intensities, cell counts, or substrate concentration vs time), and you have built a "model" to explain that data (be it a constellation of atoms in a unit cell, exponential population growth, or a microscopic reaction mechanism), I think it is generally expected that your model explain the data "to within experimental error". Unfortunately, this is never the case in macromolecular crystallography, where the model-data disagreement (Fobs-Fcalc) is ~4-5x bigger than the "error bars" (sigma(F)). Now, there is nothing shameful about an incomplete model, especially when thousands of very intelligent people working over half a century have not been able to come up with a better way to build one. In fact, perhaps a better name for the "disordered side chain problem" would be "dark density"? This name would place it properly amongst "dark matter", "dark energy" and other fudge factors introduced to try and explain why our "standard model" is not consistent with observation? That is, "dark density" is the stuff we can't see, but nonetheless must be there somewhere. Whatever it is, I personally do hold a vain belief that perhaps someday soon the problem of "dark density" will be solved, and that presently instituting a "policy" requiring that all macromolecular models from this day forward remain at least as incomplete as yesterday's models is not a very good idea. I say: if you think there is "something there" then you should build it in, especially if it is important to the conclusions you are trying to make. You can defend your model the same way you would defend any other scientific model: by using established statistics to show that it agrees with the data better than an "alternative model" (like leaving it out). It is YOUR model, after all! Only you are responsible for how "right" it is. I do appreciate that students and other novices may have a harder time defining "surfaces" and measuring hydrogen bond lengths in these pesky "floppy regions", but perhaps their education would be served better by learning the truth sooner than later? -James Holton MAD Scientist On 3/30/2011 9:26 AM, Filip Van Petegem wrote: Hello Mark, I absolutely agree with this. The worst thing is when everybody is following their own personal rules, and there are no major guidelines for end-users to figure out how to interpret those parts. I assume there are no absolute guidelines simply because there isn't any consensus among crystallographers... (from what we can gather from this set of emails...). On the other hand, this discussion has flared up many times in the past, and maybe it's time for a powerful dictator at the PDB to create the law... Filip Van Petegem On Wed, Mar 30, 2011 at 8:37 AM, Mark J van Raaij mailto:mjvanra...@cnb.csic.es>> wrote: perhaps the IUCr and/or PDB (Gerard K?) should issue some guidelines along these lines? And oblige us all to follow them? Mark J van Raaij Laboratorio M-4 Dpto de Estructura de Macromoleculas Centro Nacional de Biotecnologia - CSIC c/Darwin 3, Campus Cantoblanco E-28049 Madrid, Spain tel. (+34) 91 585 4616 http://www.cnb.csic.es/content/research/macromolecular/mvraaij/index.php?l=1 On 30 Mar 2011, at 17:29, Phoebe Rice wrote: > I've now polled 4 fairly savvy "end users" of crystal structures and there seems to be a consensus: > > - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. > - None of them would ever think to look at occupancy and they don't know how anyway. > - they expect that loops with disordered backbones would not be included in the models, and can figure out truncated or fake-ala side chains with some additioanl effort, but that option makes viewing surfaces and e-stats more of a pain. > > Phoebe > > = > Phoebe A. Rice > Dept. of Biochemistry & Molecular Biology > The University of Chicago > phone 773 834 1723 > http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 > http://www.rsc.org/shop/books/2008/9780854042722.asp > > > Original message >> Date: Tue, 29 Mar 2011 17:43:49 -0400 >> From: CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> (on behalf of Ed Pozharski mailto:epoz
Re: [ccp4bb] what to do with disordered side chains
The University of Chicago > phone 773 834 1723 > http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 > http://www.rsc.org/shop/books/2008/9780854042722.asp > > > ---- Original message ---- > >Date: Tue, 29 Mar 2011 17:43:49 -0400 > >From: CCP4 bulletin board (on behalf of Ed Pozharski > >) > >Subject: [ccp4bb] what to do with disordered side chains > >To: CCP4BB@JISCMAIL.AC.UK > > > >The results of the online survey on what to do with disordered side > >chains (from total of 240 responses): > > > >Delete the atoms 43% > >Let refinement take care of it by inflating B-factors41% > >Set occupancy to zero12% > >Other 4% > > > >"Other" suggestions were: > > > >- Place atoms in most likely spot based on rotomer and contacts and > >indicate high positional sigmas on ATMSIG records > >- To invent refinement that will spread this residues over many rotamers > >as this is what actually happened > >- Delet the atoms but retain the original amino acid name > >- choose the most common rotamer (B-factors don't "inflate", they just > >rise slightly) > >- Depends. if the disordered region is unteresting, delete atoms. > >Otherwise, try to model it in one or more disordered model (and then > >state it clearly in the pdb file) > >- In case that no density is in the map, model several conformations of > >the missing segment and insert it into the PDB file with zero > >occupancies. It is equivalent what the NMR people do. > >- Model it in and compare the MD simulations with SAXS > >- I would assumne Dale Tronrod suggestion the best. Sigatm labels. > >- Let the refinement inflate B-factors, then set occupancy to zero in > >the last round. > > > >Thanks to all for participation, > > > >Ed. > > > -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] what to do with disordered side chains
Hello Mark, I absolutely agree with this. The worst thing is when everybody is following their own personal rules, and there are no major guidelines for end-users to figure out how to interpret those parts. I assume there are no absolute guidelines simply because there isn't any consensus among crystallographers... (from what we can gather from this set of emails...). On the other hand, this discussion has flared up many times in the past, and maybe it's time for a powerful dictator at the PDB to create the law... Filip Van Petegem On Wed, Mar 30, 2011 at 8:37 AM, Mark J van Raaij wrote: > perhaps the IUCr and/or PDB (Gerard K?) should issue some guidelines along > these lines? > And oblige us all to follow them? > Mark J van Raaij > Laboratorio M-4 > Dpto de Estructura de Macromoleculas > Centro Nacional de Biotecnologia - CSIC > c/Darwin 3, Campus Cantoblanco > E-28049 Madrid, Spain > tel. (+34) 91 585 4616 > > http://www.cnb.csic.es/content/research/macromolecular/mvraaij/index.php?l=1 > > > > On 30 Mar 2011, at 17:29, Phoebe Rice wrote: > > > I've now polled 4 fairly savvy "end users" of crystal structures and > there seems to be a consensus: > > > > - they all know what B is and how to look for regions of high B (with, > say, pymol) and they know not to make firm conclusions about H-bonds to > flaming red side chains. > > - None of them would ever think to look at occupancy and they don't know > how anyway. > > - they expect that loops with disordered backbones would not be included > in the models, and can figure out truncated or fake-ala side chains with > some additioanl effort, but that option makes viewing surfaces and e-stats > more of a pain. > > > > Phoebe > > > > = > > Phoebe A. Rice > > Dept. of Biochemistry & Molecular Biology > > The University of Chicago > > phone 773 834 1723 > > > http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 > > http://www.rsc.org/shop/books/2008/9780854042722.asp > > > > > > Original message > >> Date: Tue, 29 Mar 2011 17:43:49 -0400 > >> From: CCP4 bulletin board (on behalf of Ed > Pozharski ) > >> Subject: [ccp4bb] what to do with disordered side chains > >> To: CCP4BB@JISCMAIL.AC.UK > >> > >> The results of the online survey on what to do with disordered side > >> chains (from total of 240 responses): > >> > >> Delete the atoms 43% > >> Let refinement take care of it by inflating B-factors41% > >> Set occupancy to zero12% > >> Other 4% > >> > >> "Other" suggestions were: > >> > >> - Place atoms in most likely spot based on rotomer and contacts and > >> indicate high positional sigmas on ATMSIG records > >> - To invent refinement that will spread this residues over many rotamers > >> as this is what actually happened > >> - Delet the atoms but retain the original amino acid name > >> - choose the most common rotamer (B-factors don't "inflate", they just > >> rise slightly) > >> - Depends. if the disordered region is unteresting, delete atoms. > >> Otherwise, try to model it in one or more disordered model (and then > >> state it clearly in the pdb file) > >> - In case that no density is in the map, model several conformations of > >> the missing segment and insert it into the PDB file with zero > >> occupancies. It is equivalent what the NMR people do. > >> - Model it in and compare the MD simulations with SAXS > >> - I would assumne Dale Tronrod suggestion the best. Sigatm labels. > >> - Let the refinement inflate B-factors, then set occupancy to zero in > >> the last round. > >> > >> Thanks to all for participation, > >> > >> Ed. > >> > >> -- > >> "I'd jump in myself, if I weren't so good at whistling." > >> Julian, King of Lemurs > -- Filip Van Petegem, PhD Assistant Professor The University of British Columbia Dept. of Biochemistry and Molecular Biology 2350 Health Sciences Mall - Rm 2.356 Vancouver, V6T 1Z3 phone: +1 604 827 4267 email: filip.vanpete...@gmail.com http://crg.ubc.ca/VanPetegem/
Re: [ccp4bb] what to do with disordered side chains
What about setting both the occupancy to 0 *and* setting the b-factors to some special arbitrary number, say, 500? Then people would pick up easily on the side chains being dubious, and the refinement would not be affected by them. Jacob On Wed, Mar 30, 2011 at 10:29 AM, Phoebe Rice wrote: > I've now polled 4 fairly savvy "end users" of crystal structures and there > seems to be a consensus: > > - they all know what B is and how to look for regions of high B (with, say, > pymol) and they know not to make firm conclusions about H-bonds to flaming > red side chains. > - None of them would ever think to look at occupancy and they don't know how > anyway. > - they expect that loops with disordered backbones would not be included in > the models, and can figure out truncated or fake-ala side chains with some > additioanl effort, but that option makes viewing surfaces and e-stats more of > a pain. > > Phoebe > > = > Phoebe A. Rice > Dept. of Biochemistry & Molecular Biology > The University of Chicago > phone 773 834 1723 > http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 > http://www.rsc.org/shop/books/2008/9780854042722.asp > > > Original message >>Date: Tue, 29 Mar 2011 17:43:49 -0400 >>From: CCP4 bulletin board (on behalf of Ed Pozharski >>) >>Subject: [ccp4bb] what to do with disordered side chains >>To: CCP4BB@JISCMAIL.AC.UK >> >>The results of the online survey on what to do with disordered side >>chains (from total of 240 responses): >> >>Delete the atoms 43% >>Let refinement take care of it by inflating B-factors 41% >>Set occupancy to zero 12% >>Other 4% >> >>"Other" suggestions were: >> >>- Place atoms in most likely spot based on rotomer and contacts and >>indicate high positional sigmas on ATMSIG records >>- To invent refinement that will spread this residues over many rotamers >>as this is what actually happened >>- Delet the atoms but retain the original amino acid name >>- choose the most common rotamer (B-factors don't "inflate", they just >>rise slightly) >>- Depends. if the disordered region is unteresting, delete atoms. >>Otherwise, try to model it in one or more disordered model (and then >>state it clearly in the pdb file) >>- In case that no density is in the map, model several conformations of >>the missing segment and insert it into the PDB file with zero >>occupancies. It is equivalent what the NMR people do. >>- Model it in and compare the MD simulations with SAXS >>- I would assumne Dale Tronrod suggestion the best. Sigatm labels. >>- Let the refinement inflate B-factors, then set occupancy to zero in >>the last round. >> >>Thanks to all for participation, >> >>Ed. >> >>-- >>"I'd jump in myself, if I weren't so good at whistling." >> Julian, King of Lemurs > -- *** Jacob Pearson Keller Northwestern University Medical Scientist Training Program cel: 773.608.9185 email: j-kell...@northwestern.edu ***
Re: [ccp4bb] what to do with disordered side chains
perhaps the IUCr and/or PDB (Gerard K?) should issue some guidelines along these lines? And oblige us all to follow them? Mark J van Raaij Laboratorio M-4 Dpto de Estructura de Macromoleculas Centro Nacional de Biotecnologia - CSIC c/Darwin 3, Campus Cantoblanco E-28049 Madrid, Spain tel. (+34) 91 585 4616 http://www.cnb.csic.es/content/research/macromolecular/mvraaij/index.php?l=1 On 30 Mar 2011, at 17:29, Phoebe Rice wrote: > I've now polled 4 fairly savvy "end users" of crystal structures and there > seems to be a consensus: > > - they all know what B is and how to look for regions of high B (with, say, > pymol) and they know not to make firm conclusions about H-bonds to flaming > red side chains. > - None of them would ever think to look at occupancy and they don't know how > anyway. > - they expect that loops with disordered backbones would not be included in > the models, and can figure out truncated or fake-ala side chains with some > additioanl effort, but that option makes viewing surfaces and e-stats more of > a pain. > > Phoebe > > = > Phoebe A. Rice > Dept. of Biochemistry & Molecular Biology > The University of Chicago > phone 773 834 1723 > http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 > http://www.rsc.org/shop/books/2008/9780854042722.asp > > > Original message >> Date: Tue, 29 Mar 2011 17:43:49 -0400 >> From: CCP4 bulletin board (on behalf of Ed Pozharski >> ) >> Subject: [ccp4bb] what to do with disordered side chains >> To: CCP4BB@JISCMAIL.AC.UK >> >> The results of the online survey on what to do with disordered side >> chains (from total of 240 responses): >> >> Delete the atoms 43% >> Let refinement take care of it by inflating B-factors41% >> Set occupancy to zero12% >> Other 4% >> >> "Other" suggestions were: >> >> - Place atoms in most likely spot based on rotomer and contacts and >> indicate high positional sigmas on ATMSIG records >> - To invent refinement that will spread this residues over many rotamers >> as this is what actually happened >> - Delet the atoms but retain the original amino acid name >> - choose the most common rotamer (B-factors don't "inflate", they just >> rise slightly) >> - Depends. if the disordered region is unteresting, delete atoms. >> Otherwise, try to model it in one or more disordered model (and then >> state it clearly in the pdb file) >> - In case that no density is in the map, model several conformations of >> the missing segment and insert it into the PDB file with zero >> occupancies. It is equivalent what the NMR people do. >> - Model it in and compare the MD simulations with SAXS >> - I would assumne Dale Tronrod suggestion the best. Sigatm labels. >> - Let the refinement inflate B-factors, then set occupancy to zero in >> the last round. >> >> Thanks to all for participation, >> >> Ed. >> >> -- >> "I'd jump in myself, if I weren't so good at whistling." >> Julian, King of Lemurs
Re: [ccp4bb] what to do with disordered side chains
I've now polled 4 fairly savvy "end users" of crystal structures and there seems to be a consensus: - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. - None of them would ever think to look at occupancy and they don't know how anyway. - they expect that loops with disordered backbones would not be included in the models, and can figure out truncated or fake-ala side chains with some additioanl effort, but that option makes viewing surfaces and e-stats more of a pain. Phoebe = Phoebe A. Rice Dept. of Biochemistry & Molecular Biology The University of Chicago phone 773 834 1723 http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123 http://www.rsc.org/shop/books/2008/9780854042722.asp Original message >Date: Tue, 29 Mar 2011 17:43:49 -0400 >From: CCP4 bulletin board (on behalf of Ed Pozharski >) >Subject: [ccp4bb] what to do with disordered side chains >To: CCP4BB@JISCMAIL.AC.UK > >The results of the online survey on what to do with disordered side >chains (from total of 240 responses): > >Delete the atoms 43% >Let refinement take care of it by inflating B-factors41% >Set occupancy to zero12% >Other 4% > >"Other" suggestions were: > >- Place atoms in most likely spot based on rotomer and contacts and >indicate high positional sigmas on ATMSIG records >- To invent refinement that will spread this residues over many rotamers >as this is what actually happened >- Delet the atoms but retain the original amino acid name >- choose the most common rotamer (B-factors don't "inflate", they just >rise slightly) >- Depends. if the disordered region is unteresting, delete atoms. >Otherwise, try to model it in one or more disordered model (and then >state it clearly in the pdb file) >- In case that no density is in the map, model several conformations of >the missing segment and insert it into the PDB file with zero >occupancies. It is equivalent what the NMR people do. >- Model it in and compare the MD simulations with SAXS >- I would assumne Dale Tronrod suggestion the best. Sigatm labels. >- Let the refinement inflate B-factors, then set occupancy to zero in >the last round. > >Thanks to all for participation, > >Ed. > >-- >"I'd jump in myself, if I weren't so good at whistling." > Julian, King of Lemurs
[ccp4bb] what to do with disordered side chains
The results of the online survey on what to do with disordered side chains (from total of 240 responses): Delete the atoms 43% Let refinement take care of it by inflating B-factors41% Set occupancy to zero12% Other 4% "Other" suggestions were: - Place atoms in most likely spot based on rotomer and contacts and indicate high positional sigmas on ATMSIG records - To invent refinement that will spread this residues over many rotamers as this is what actually happened - Delet the atoms but retain the original amino acid name - choose the most common rotamer (B-factors don't "inflate", they just rise slightly) - Depends. if the disordered region is unteresting, delete atoms. Otherwise, try to model it in one or more disordered model (and then state it clearly in the pdb file) - In case that no density is in the map, model several conformations of the missing segment and insert it into the PDB file with zero occupancies. It is equivalent what the NMR people do. - Model it in and compare the MD simulations with SAXS - I would assumne Dale Tronrod suggestion the best. Sigatm labels. - Let the refinement inflate B-factors, then set occupancy to zero in the last round. Thanks to all for participation, Ed. -- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs