On 4/4/2011 2:15 PM, Jacob Keller wrote:
I like your IMGATM proposal, but wouldn't it also potentially break
some of the programs?

   That depends on the program.  Programs I write that read PDB files
silently ignore keywords that they don't recognize.  A model with
IMGATM (or whatever keyword you standardize on) records would be
interpreted as those those dummy atoms don't exist.  If a program
died because of them, or if the PDB consumer wanted to "see" the
dummy atoms the keywords could be replaced with ATOM using a text
editor and a global substitute, and the user would be aware that
there is something different about those atoms.

   I would hope programs would be modified to do sensible things
with the dummy atoms since they would have a clear indication that
the atoms are indeed dummy.  For a graphics program, maybe the bonds
involving dummy atoms could be drawn a half brightness.  They would
be visible but clearly more ghost-like than the majority
of atoms in the model.  A refinement program could strip them out,
perform the refinement, and rebuild them at the end, if needed,
using WASNIAHC.  I expect they would also be ignored completely in
MR and homology modeling/comparison programs.  In fact, pretty much
any use I would make of the PDB file would involve discarding all
the dummy atoms, but with this scheme I could at least know for
sure which atoms are fantasy and which were build based on density.

Also--and this is a problem with deleting only
sidechain atoms in general--it seems that many, myself included, might
totally miss that an apparent "alanine" is really a trunco-lysine.
What I like is that it does get around the problem of people
over-interpreting bogus sidechains, but it falls short, perhaps, in
misleading people about what residue is there. I, for one, would not
feel that I had to click on all the alanines in a model to verify that
they were not lysines, and would be surprised and puzzled for a while
about why this ala said lys when I clicked on it. Wouldn't you be
surprised? (Well, maybe not after this thread...)

   I am surprised any time I see all the atoms in a lysine on the surface.
"What could possibly be holding that thing in place?" is what jumps to my
mind.  When I see a side chain on the surface that ends at CB or CG I
just assume it is something long and waving in the breeze.  I guess it
all depends on what you are used to looking at.

   With dummy atoms that are clearly labeled as such then the graphics
programs can be programed as I described above and we both would have
the visual cues that we desire.

   Another advantage of keeping the "dummy flag" separate from the occupancy
and B factor fields is that these are then free to be used in the way
they were intended.  Numerous times I have built side chains that are
visible to their end, but a second conformation ends at the CG.  I split
these side chains into A and B parts with a complete A and a partial B and
the group occupancies of A and B sum to 1.0.  Now if you tell me that
I have to build the entire B side chain and must flag the dummy atoms
with occ=0.0 we have a problem.  For the dummy atoms the occupancies don't
sum to 1.0 any more.  Logic tells me that the occupancy of the dummy atoms
should be the same as all the real B atoms.

   This particular case is a good example of why I don't like the idea
of building complete side chains in the absence of density.  If you are
going to build out my B conformation you have to recognize that the reason
I don't see density beyond the CG is that there is a B and C conformation
for the next CD atom (remember I already have an A conformation for CD
elsewhere).  To make a logically complete side chain I need to build
two dummy conformations for this residue and split my "real" CG, CB, and
CA B conformation atoms with no way to decide the relative occupancies of
the B and C conformations.  That's a lot of complexity for a blurry bit of
density.  Hell, I have every reason to expect that there is a D conformation
in there too - do I have to build that as well?

   If you expect such a shrub to be built for every surface lysine the
IMGATM keyword and the program WASNIAHC would allow it to be generated
and represented in an unambiguous and minimally confusing fashion.  I
wouldn't be happy having to add imaginary atoms to my models, but the
representation meets my criteria, and I think it meets yours too.

Dale Tronrud


JPK



On Mon, Apr 4, 2011 at 1:55 AM, Dale Tronrud<det...@uoxray.uoregon.edu>  wrote:
   The definition of _atom_site.occupancy is

  The fraction of the atom type present at this site.
  The sum of the occupancies of all the atom types at this site
  may not significantly exceed 1.0 unless it is a dummy site.

When an atom has an occupancy equal to zero that means that the
atom is NEVER present at that site - and that is not what you
intend to say.  Setting the occupancy to zero does not mean that
a full atom is located somewhere in this area.  Quite the opposite.

   (The reference to a dummy site is interesting and implies to
me that mmCIF already has the mechanism you wish for.)

   Having some experience with refining low occupancy atoms and
working with dummy marker atoms I'm quite confident that you can
never define a B factor cutoff that would work.  No matter what
value you choose you will find some atoms in density that refine
to values greater than the cutoff, or the limit you choose is so
high that you will find marker atoms that refine to less than the
limit.  A B factor cutoff cannot work - no matter the value you
choose you will always be plagued with false positives or false
negatives.

   If you really want to stuff this bit into one of these fields
you have to go all out.  Set the occupancy of a marker atom to -99.99.
This will unambiguously mark the atom as an imaginary one.  This
will, of course, break every program that reads PDB format files,
but that is what should happen in any case.  If you change the
definition of the columns in the file you must mandate that all
programs be upgraded to recognized the new definitions.  I don't
know how you can do that other than ensuring that the change will
cause programs to cough.  To try to slide it by with a magic value
that will be silently accepted by existing programs is to beg for
bugs and subtle side-effects.

   Good luck getting the maintainers of the mmCIF standard to accept
a magic value in either of these fields.

   How about this: We already have the keywords ATOM and HETATM
(and don't ask me why we have two).  How about we create a new
record in the PDB format, say IMGATM, that would have all the
fields of an ATOM record but would be recognized as whatever the
marker is for "dummy" atoms in the current mmCIF?  Existing programs
would completely ignore these atoms, as they should until they are
modified to do something reasonable with them.  Those of us who
have no use for them can either use a switch in the program to
ignore them or just grep them out of the file.  Someone could write
a program that would take a model with only ATOM and HETATM records
and fill out all the desired IMGATM records (Let's call that program
WASNIAHC, everyone would remember that!).

   This solution is unambiguous.  It can be represented in current
mmCIF, I think.  The PDB could run WASNIAHC themselves after deposition
but before acceptance by the depositor so people like me would not
have to deal with them during refinement but would be able to see
them before our precious works of art are unleashed on the world.

   Seems like a win-win solution to me.

Dale Tronrud


On 4/3/2011 9:17 PM, Jacob Keller wrote:

Well, what about getting the default settings on the major molecular
viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")?
While the b cutoff is still be tricky, I assume we could eventually
come to consensus on some reasonable cutoff (2 sigma from the mean?),
and then this approach would allow each free-spirited crystallographer
to keep his own preferred method of dealing with these troublesome
sidechains and nary a novice would be led astray....

JPK

On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett<er...@pobox.com>    wrote:

Most non-structural users are familiar with the sequence of the proteins
they are studying, and most software does at least display residue identity
if you select an atom in a residue, so usually it is not necessary to do any
cross checking besides selecting an atom in the residue and seeing what its
residue name is.  The chance of somebody misinterpreting a truncated Lys as
Ala is, in my experience, much much lower than the chance they will trust
the xyz coordinates of atoms with zero occupancy or high B factors.

What worries me the most is somebody designing a whole biological
experiment around an over-interpretation of details that are implied by xyz
coordinates of atoms, even if those atoms were not resolved in the maps.
  When this sort of error occurs it is a level of pain and wasted effort that
makes the "pain" associated with having to build back in missing side chains
look completely trivial.

As long as the PDB file format is the way users get structural data,
there is really no good way to communicate "atom exists with no reliable
coordinates" to the user, given the diversity of software packages out there
for reading PDB files and the historical lack of any standard way of dealing
with this issue.  Even if the file format is hacked there is no way to force
all the existing software out there to understand the hack.  A file format
that isn't designed with this sort of feature from day one is not going to
be fixable as a practical matter after so much legacy code has accumulated.

-Eric



On Apr 3, 2011, at 2:20 PM, Jacob Keller wrote:

To the delete-the-atom-nik's: do you propose deleting the whole
residue or just the side chain? I can understand deleting the whole
residue, but deleting only the side chain seems to me to be placing a
stumbling block also, and even possibly confusing for an experienced
crystallographer: the .pdb says "lys" but it looks like an ala? Which
is it? I could imagine a lot of frustration-hours arising from this
practice, with people cross-checking sequences, looking in the methods
sections for mutations...

JPK









Reply via email to