It's nice to see that this discussion pops up every two years or so with exactly the same arguments.... :)

My vote (as always) is for leaving the atoms of disordered side chains in with high B values, the B values are part of the models. Its up to the popular Biologist's visualization software out there to properly display these models. I'm sure we can use all kinds of nice 4D blurry renderings of these disordered atoms nowadays.

Flip

On 4/4/2011 23:15, Jacob Keller wrote:
I like your IMGATM proposal, but wouldn't it also potentially break
some of the programs? Also--and this is a problem with deleting only
sidechain atoms in general--it seems that many, myself included, might
totally miss that an apparent "alanine" is really a trunco-lysine.
What I like is that it does get around the problem of people
over-interpreting bogus sidechains, but it falls short, perhaps, in
misleading people about what residue is there. I, for one, would not
feel that I had to click on all the alanines in a model to verify that
they were not lysines, and would be surprised and puzzled for a while
about why this ala said lys when I clicked on it. Wouldn't you be
surprised? (Well, maybe not after this thread...)

JPK



On Mon, Apr 4, 2011 at 1:55 AM, Dale Tronrud<det...@uoxray.uoregon.edu>  wrote:
   The definition of _atom_site.occupancy is

  The fraction of the atom type present at this site.
  The sum of the occupancies of all the atom types at this site
  may not significantly exceed 1.0 unless it is a dummy site.

When an atom has an occupancy equal to zero that means that the
atom is NEVER present at that site - and that is not what you
intend to say.  Setting the occupancy to zero does not mean that
a full atom is located somewhere in this area.  Quite the opposite.

   (The reference to a dummy site is interesting and implies to
me that mmCIF already has the mechanism you wish for.)

   Having some experience with refining low occupancy atoms and
working with dummy marker atoms I'm quite confident that you can
never define a B factor cutoff that would work.  No matter what
value you choose you will find some atoms in density that refine
to values greater than the cutoff, or the limit you choose is so
high that you will find marker atoms that refine to less than the
limit.  A B factor cutoff cannot work - no matter the value you
choose you will always be plagued with false positives or false
negatives.

   If you really want to stuff this bit into one of these fields
you have to go all out.  Set the occupancy of a marker atom to -99.99.
This will unambiguously mark the atom as an imaginary one.  This
will, of course, break every program that reads PDB format files,
but that is what should happen in any case.  If you change the
definition of the columns in the file you must mandate that all
programs be upgraded to recognized the new definitions.  I don't
know how you can do that other than ensuring that the change will
cause programs to cough.  To try to slide it by with a magic value
that will be silently accepted by existing programs is to beg for
bugs and subtle side-effects.

   Good luck getting the maintainers of the mmCIF standard to accept
a magic value in either of these fields.

   How about this: We already have the keywords ATOM and HETATM
(and don't ask me why we have two).  How about we create a new
record in the PDB format, say IMGATM, that would have all the
fields of an ATOM record but would be recognized as whatever the
marker is for "dummy" atoms in the current mmCIF?  Existing programs
would completely ignore these atoms, as they should until they are
modified to do something reasonable with them.  Those of us who
have no use for them can either use a switch in the program to
ignore them or just grep them out of the file.  Someone could write
a program that would take a model with only ATOM and HETATM records
and fill out all the desired IMGATM records (Let's call that program
WASNIAHC, everyone would remember that!).

   This solution is unambiguous.  It can be represented in current
mmCIF, I think.  The PDB could run WASNIAHC themselves after deposition
but before acceptance by the depositor so people like me would not
have to deal with them during refinement but would be able to see
them before our precious works of art are unleashed on the world.

   Seems like a win-win solution to me.

Dale Tronrud


On 4/3/2011 9:17 PM, Jacob Keller wrote:

Well, what about getting the default settings on the major molecular
viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")?
While the b cutoff is still be tricky, I assume we could eventually
come to consensus on some reasonable cutoff (2 sigma from the mean?),
and then this approach would allow each free-spirited crystallographer
to keep his own preferred method of dealing with these troublesome
sidechains and nary a novice would be led astray....

JPK

On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett<er...@pobox.com>    wrote:

Most non-structural users are familiar with the sequence of the proteins
they are studying, and most software does at least display residue identity
if you select an atom in a residue, so usually it is not necessary to do any
cross checking besides selecting an atom in the residue and seeing what its
residue name is.  The chance of somebody misinterpreting a truncated Lys as
Ala is, in my experience, much much lower than the chance they will trust
the xyz coordinates of atoms with zero occupancy or high B factors.

What worries me the most is somebody designing a whole biological
experiment around an over-interpretation of details that are implied by xyz
coordinates of atoms, even if those atoms were not resolved in the maps.
  When this sort of error occurs it is a level of pain and wasted effort that
makes the "pain" associated with having to build back in missing side chains
look completely trivial.

As long as the PDB file format is the way users get structural data,
there is really no good way to communicate "atom exists with no reliable
coordinates" to the user, given the diversity of software packages out there
for reading PDB files and the historical lack of any standard way of dealing
with this issue.  Even if the file format is hacked there is no way to force
all the existing software out there to understand the hack.  A file format
that isn't designed with this sort of feature from day one is not going to
be fixable as a practical matter after so much legacy code has accumulated.

-Eric



On Apr 3, 2011, at 2:20 PM, Jacob Keller wrote:

To the delete-the-atom-nik's: do you propose deleting the whole
residue or just the side chain? I can understand deleting the whole
residue, but deleting only the side chain seems to me to be placing a
stumbling block also, and even possibly confusing for an experienced
crystallographer: the .pdb says "lys" but it looks like an ala? Which
is it? I could imagine a lot of frustration-hours arising from this
practice, with people cross-checking sequences, looking in the methods
sections for mutations...

JPK









Reply via email to