Hi Jacob,

The PDB header has a record for missing atoms. Coot has an option to find them 
and any decent validation software will warn about incomplete residues. There 
are PDBREPORT entries for every PDB file with a list of incomplete residues. If 
a user makes a very small effort, he doesn't have to go around clicking every 
'alanine'.

Cheers,
Robbie

> Date: Mon, 4 Apr 2011 16:15:58 -0500
> From: j-kell...@fsm.northwestern.edu
> Subject: Re: [ccp4bb] what to do with disordered side chains
> To: CCP4BB@JISCMAIL.AC.UK
> 
> I like your IMGATM proposal, but wouldn't it also potentially break
> some of the programs? Also--and this is a problem with deleting only
> sidechain atoms in general--it seems that many, myself included, might
> totally miss that an apparent "alanine" is really a trunco-lysine.
> What I like is that it does get around the problem of people
> over-interpreting bogus sidechains, but it falls short, perhaps, in
> misleading people about what residue is there. I, for one, would not
> feel that I had to click on all the alanines in a model to verify that
> they were not lysines, and would be surprised and puzzled for a while
> about why this ala said lys when I clicked on it. Wouldn't you be
> surprised? (Well, maybe not after this thread...)
> 
> JPK
> 
> 
> 
> On Mon, Apr 4, 2011 at 1:55 AM, Dale Tronrud <det...@uoxray.uoregon.edu> 
> wrote:
> >   The definition of _atom_site.occupancy is
> >
> >  The fraction of the atom type present at this site.
> >  The sum of the occupancies of all the atom types at this site
> >  may not significantly exceed 1.0 unless it is a dummy site.
> >
> > When an atom has an occupancy equal to zero that means that the
> > atom is NEVER present at that site - and that is not what you
> > intend to say.  Setting the occupancy to zero does not mean that
> > a full atom is located somewhere in this area.  Quite the opposite.
> >
> >   (The reference to a dummy site is interesting and implies to
> > me that mmCIF already has the mechanism you wish for.)
> >
> >   Having some experience with refining low occupancy atoms and
> > working with dummy marker atoms I'm quite confident that you can
> > never define a B factor cutoff that would work.  No matter what
> > value you choose you will find some atoms in density that refine
> > to values greater than the cutoff, or the limit you choose is so
> > high that you will find marker atoms that refine to less than the
> > limit.  A B factor cutoff cannot work - no matter the value you
> > choose you will always be plagued with false positives or false
> > negatives.
> >
> >   If you really want to stuff this bit into one of these fields
> > you have to go all out.  Set the occupancy of a marker atom to -99.99.
> > This will unambiguously mark the atom as an imaginary one.  This
> > will, of course, break every program that reads PDB format files,
> > but that is what should happen in any case.  If you change the
> > definition of the columns in the file you must mandate that all
> > programs be upgraded to recognized the new definitions.  I don't
> > know how you can do that other than ensuring that the change will
> > cause programs to cough.  To try to slide it by with a magic value
> > that will be silently accepted by existing programs is to beg for
> > bugs and subtle side-effects.
> >
> >   Good luck getting the maintainers of the mmCIF standard to accept
> > a magic value in either of these fields.
> >
> >   How about this: We already have the keywords ATOM and HETATM
> > (and don't ask me why we have two).  How about we create a new
> > record in the PDB format, say IMGATM, that would have all the
> > fields of an ATOM record but would be recognized as whatever the
> > marker is for "dummy" atoms in the current mmCIF?  Existing programs
> > would completely ignore these atoms, as they should until they are
> > modified to do something reasonable with them.  Those of us who
> > have no use for them can either use a switch in the program to
> > ignore them or just grep them out of the file.  Someone could write
> > a program that would take a model with only ATOM and HETATM records
> > and fill out all the desired IMGATM records (Let's call that program
> > WASNIAHC, everyone would remember that!).
> >
> >   This solution is unambiguous.  It can be represented in current
> > mmCIF, I think.  The PDB could run WASNIAHC themselves after deposition
> > but before acceptance by the depositor so people like me would not
> > have to deal with them during refinement but would be able to see
> > them before our precious works of art are unleashed on the world.
> >
> >   Seems like a win-win solution to me.
> >
> > Dale Tronrud
> >
> >
> > On 4/3/2011 9:17 PM, Jacob Keller wrote:
> >>
> >> Well, what about getting the default settings on the major molecular
> >> viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")?
> >> While the b cutoff is still be tricky, I assume we could eventually
> >> come to consensus on some reasonable cutoff (2 sigma from the mean?),
> >> and then this approach would allow each free-spirited crystallographer
> >> to keep his own preferred method of dealing with these troublesome
> >> sidechains and nary a novice would be led astray....
> >>
> >> JPK
> >>
> >> On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett<er...@pobox.com>  wrote:
> >>>
> >>> Most non-structural users are familiar with the sequence of the proteins
> >>> they are studying, and most software does at least display residue 
> >>> identity
> >>> if you select an atom in a residue, so usually it is not necessary to do 
> >>> any
> >>> cross checking besides selecting an atom in the residue and seeing what 
> >>> its
> >>> residue name is.  The chance of somebody misinterpreting a truncated Lys 
> >>> as
> >>> Ala is, in my experience, much much lower than the chance they will trust
> >>> the xyz coordinates of atoms with zero occupancy or high B factors.
> >>>
> >>> What worries me the most is somebody designing a whole biological
> >>> experiment around an over-interpretation of details that are implied by 
> >>> xyz
> >>> coordinates of atoms, even if those atoms were not resolved in the maps.
> >>>  When this sort of error occurs it is a level of pain and wasted effort 
> >>> that
> >>> makes the "pain" associated with having to build back in missing side 
> >>> chains
> >>> look completely trivial.
> >>>
> >>> As long as the PDB file format is the way users get structural data,
> >>> there is really no good way to communicate "atom exists with no reliable
> >>> coordinates" to the user, given the diversity of software packages out 
> >>> there
> >>> for reading PDB files and the historical lack of any standard way of 
> >>> dealing
> >>> with this issue.  Even if the file format is hacked there is no way to 
> >>> force
> >>> all the existing software out there to understand the hack.  A file format
> >>> that isn't designed with this sort of feature from day one is not going to
> >>> be fixable as a practical matter after so much legacy code has 
> >>> accumulated.
> >>>
> >>> -Eric
> >>>
> >>>
> >>>
> >>> On Apr 3, 2011, at 2:20 PM, Jacob Keller wrote:
> >>>
> >>>> To the delete-the-atom-nik's: do you propose deleting the whole
> >>>> residue or just the side chain? I can understand deleting the whole
> >>>> residue, but deleting only the side chain seems to me to be placing a
> >>>> stumbling block also, and even possibly confusing for an experienced
> >>>> crystallographer: the .pdb says "lys" but it looks like an ala? Which
> >>>> is it? I could imagine a lot of frustration-hours arising from this
> >>>> practice, with people cross-checking sequences, looking in the methods
> >>>> sections for mutations...
> >>>>
> >>>> JPK
> >>>>
> >>>
> >>
> >>
> >>
> >
> 
> 
> 
> -- 
> *******************************************
> Jacob Pearson Keller
> Northwestern University
> Medical Scientist Training Program
> cel: 773.608.9185
> email: j-kell...@northwestern.edu
> *******************************************
                                          

Reply via email to